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Abstract 

An  autonomous  vehicle  must  accurately  observe  its  location  within  the  environ¬ 
ment  to  interact  with  objects  and  accomplish  its  mission.  When  its  environment  is 
unknown,  the  vehicle  must  construct  a  map  detailing  its  surroundings  while  using  it 
to  maintain  an  accurate  location.  Such  a  vehicle  is  faced  with  the  circularly  defined 
Simultaneous  Localization  and  Mapping  (SLAM)  problem.  However  difficult,  SLAM 
is  a  critical  component  of  autonomous  vehicle  exploration  with  applications  to  search 
and  rescue.  To  current  knowledge,  this  research  presents  the  first  SLAM  solution  to 
integrate  stereo  cameras,  inertial  measurements,  and  vehicle  odometry  into  a  Mul¬ 
tiple  Integrated  Navigation  Sensor  (MINS)  path.  The  implementation  combines  the 
MINS  path  with  LIDAR  to  observe  and  map  the  environment  using  the  FastSLAM 
algorithm.  In  real-world  tests,  a  mobile  ground  vehicle  equipped  with  these  sensors 
completed  a  140  meter  loop  around  indoor  hallways.  This  SLAM  solution  produces 
a  path  that  closes  the  loop  and  remains  within  1  meter  of  truth,  reducing  the  error 
92%  from  an  image-inertial  navigation  system  and  79%  from  odometry  FastSLAM. 
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MULTIPLE  INTEGRATED  NAVIGATION  SENSORS 
FOR  IMPROVED  OGGUPANGY  GRID  FASTSLAM 

I.  Introduction 

As  the  environment  of  warfare  shifts  from  open  helds  to  urban  cities,  it  becomes 
increasingly  difficult  and  dangerous  for  human  forces  to  occupy  and  control  the  bat- 
tleheld.  Recent  increases  in  vehicle  autonomy  have  allowed  the  warhghter  to  take 
a  step  away  from  the  danger.  Pilots  and  soldiers  now  extensively  control  Remotely 
Piloted  Aircraft  (RPA)  and  Explosive  Ordinance  Disposal  (EOD)  robots  to  remove 
themselves  from  some  of  the  most  vulnerable  situations  [2], 

While  their  use  is  undisputedly  saving  lives,  these  existing  systems  often  require  a 
large  team  to  operate  and  lack  the  autonomy  that  is  often  associated  with  robots  [2], 
Increasing  autonomy  in  these  vehicles  would  alleviate  the  need  for  such  a  large  support 
team.  Reducing  the  number  of  people  involved  allows  the  military  to  deploy  more 
vehicles  where  they  are  needed  and  require  less  human  operators  and  maintainers. 

Additional  situations  in  future  operations  also  beneht  from  the  use  of  autonomous 
vehicles.  Existing  platforms  are  incapable  of  target  identihcation  or  Gombat  Search 
and  Rescue  (GSAR)  missions  in  environments  that  demand  ground  transportation. 
An  urban  area  GSAR  mission  requires  that  a  vehicle  maintain  an  accurate  position 
of  itself  while  navigating  around  obstacles  to  its  eventual  goal. 

To  increase  the  autonomy  of  these  vehicles,  they  must  be  able  to  navigate  without 
the  direct  control  of  a  driver.  Each  of  these  applications  requires  the  vehicle  to  observe 
its  environment  and  store  a  representation  to  pinpoint  the  location  of  a  target  within 
the  environment  and  act  on  it.  Whether  the  vehicle  must  report  on  an  adversary. 
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handle  explosives,  or  rescue  a  casualty,  it  needs  to  use  a  map  to  accomplish  its  mission. 
In  addition  to  external  sensors,  this  requires  a  precise  navigation  position. 

The  warfighter  of  today  navigates  almost  exclusively  through  the  use  of  the  Global 
Positioning  System  (GPS)  network.  Since  the  advent  of  the  GPS  satellite  constel¬ 
lation,  GPS  receivers  have  afforded  the  military  an  invaluable  resource  for  precise 
navigation.  This  technology  has  a  vast  number  of  applications  for  both  military  and 
commercial  users,  and  has  enabled  wildly  accurate  global  navigation. 

However,  this  capability  has  also  created  a  significant  dependence  on  accurate 
navigation.  Over  time,  many  systems  and  many  people  have  come  to  rely  on  GPS 
to  accomplish  their  mission.  Without  a  signal,  not  only  would  many  drivers  end  up 
lost,  but  many  military  weapon  systems  would  also  be  unable  to  function. 

Unfortunately,  although  the  GPS  signal  is  global,  it  is  not  available  everywhere.  In 
bad  weather,  inside  most  building,  between  tall  structures,  and  underground,  a  GPS 
signal  is  unreliable  and  often  too  attenuated  to  be  usable.  An  additional  consideration 
is  the  adversaries  ability  to  jam  the  GPS  signal  and  render  it  unusable.  Because  of 
the  number  of  capabilities  now  defined  by  their  accuracy,  the  military  needs  other 
means  to  achieve  similar  precision  navigation  in  all  operating  environments. 

1.1  Motivation 

A  vehicle  operating  in  areas  without  GPS  requires  a  navigation  method  that 
functions  without  using  external  signals.  This  means  the  vehicle  must  rely  on  its  own 
sensors  to  explore  an  environment  while  maintaining  an  accurate  position.  Gurrent 
missions  demand  an  accurate  position  to  always  be  available,  especially  when  a  vehicle 
relays  that  location  to  its  own  operators  or  to  an  allied  vehicle. 

If  the  concept  of  internal  navigation  sounds  unlikely,  consider  comparing  current 
technology  to  biological  systems.  Humans  and  other  animals  are  able  to  navigate 
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in  both  known  and  unknown  environments  with  incredible  accuracy.  From  a  design 
standpoint,  biological  navigation  amounts  chiefly  to  stereo  vision  coupled  with  touch 
and  inertial  sensors. 

Since  the  goal  is  to  obtain  a  similar  level  of  navigation,  consider  sensor  config¬ 
urations  attempts  at  mimicking  the  abilities  of  animals.  That  is,  using  cameras  to 
see  and  an  inertial  unit  to  sense  movement.  Combine  these  sensors  and  the  result 
is  an  artificial  navigation  vehicle.  When  external  signals  are  unavailable  due  to  the 
physical  environment  or  the  actions  of  adversaries,  the  scenario  demands  the  use  of 
local  sensors.  An  autonomous  vehicle  of  this  kind  can  be  used  to  explore  and  collect 
information  or  even  take  action  if  necessary.  Even  in  hostile  environments,  an  in¬ 
ternally  navigating  vehicle  can  accomplish  its  mission  without  relying  on  vulnerable 
external  means. 

1.2  Problem  Definition 

In  mobile  robotics,  the  Simultaneous  Localization  and  Mapping  (SLAM)  problem 
addresses  where  a  vehicle  is  and  what  its  environment  contains  [60].  A  functional 
SLAM  solution  is  a  key  component  of  an  autonomous  vehicle,  especially  one  whose 
mission  involves  gaining  knowledge  of  unknown  areas.  It  provides  the  ability  to 
explore  unknown  environments  without  prior  knowledge  and  report  on  its  findings. 
Because  of  the  sensors  involved,  the  solution  does  not  rely  on  any  external  signals 
commonly  used  for  navigation  or  communication. 

The  SLAM  problem  is  often  represented  as  a  Dynamic  Bayes  Network  (DBN) 
from  time  0  to  time  t  consisting  of  the  vehicle  state,  control  inputs  Ut,  measurement 
observations  Zt,  and  external  landmarks  6  [46].  Figure  1  illustrates  their  relationship 
in  a  DBN. 

At  time  0,  the  vehicle  maintains  xq  and  makes  observation  zo,  which  corresponds 
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Figure  1.  The  SLAM  problem  as  a  dynamic  Bayes  network.  Vehicle  states  xo:t,  controls 
ui:t,  and  observations  zo:t:  have  subscripts  denoting  time,  with  states  Xo:t  highlighted  to 
show  conditional  independence.  Landmarks  9  are  numbered  arbitrarily  [8,  46]. 

to  a  physical  landmark  6i.  The  vehicle  is  then  given  a  control  command  ui,  which 
results  in  a  new  state  xi  where  Zi  observes  the  same  landmark.  After  receiving  U2, 
the  new  vehicle  observation  Z2  corresponds  to  a  different  landmark  62-  In  this  way, 
Ut  results  in  new  positions  and  Zt  can  observe  previous  landmarks  in  addition  to  new 
ones. 

This  DBN  arrangement  results  in  an  important  property  of  the  network.  Because 
each  node  xi  to  x*  of  the  vehicle  path  xq,*  is  a  parent  to  each  observation  zo:t,  the 
observations  are  independent  from  each  other  given  the  vehicle  path  and  physical 
landmark  locations  [52].  Algorithms  make  use  of  this  independence  property  when 
producing  SLAM  solutions  by  using  only  the  current  path  node  and  separating  ob¬ 
served  landmarks  [46]. 

The  DBN  also  illustrates  another  important  aspect  of  the  SLAM  problem.  State 
Xt  increments  with  ut  and  Zt,  depending  only  on  Xt_i  and  external  landmarks  9. 
Thus,  the  illustration  makes  the  Markov  assumption  that  all  relevant  information  is 
captured  in  the  current  state  [52].  A  SLAM  algorithm  that  makes  this  assumption 
is  called  an  online  solution  [61]  and  only  uses  the  current  ut  and  Zt.  This  research 
is  focused  on  online  solutions,  treating  the  problem  as  illustrated  in  Figure  1,  and 
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maintaining  a  current  belief  at  each  time  t,  an  important  property  of  a  mapping 
vehicle. 

Alternatively,  algorithms  that  store  previous  measurement  and  state  data  to  cal¬ 
culate  together  are  offline  solutions.  They  do  not  make  the  Markov  assumption  and 
thus  store  all  observed  data  and  state  information.  Therefore,  offline  solutions  re¬ 
quire  more  memory  then  online  solutions  and  make  calculations  once  all  data  has 
been  gathered  [61]. 

SLAM  is  often  described  fundamentally  as  a  paradoxical  problem.  In  order  for  a 
vehicle  to  answer  what  an  environment  looks  like,  it  must  know  where  it  is  relative 
to  its  surroundings.  Likewise,  in  order  to  answer  the  question  of  where  it  is,  the 
vehicle  must  know  its  environment  to  place  itself.  This  description  presents  the  SLAM 
problem  to  require  an  iterative  solution.  To  remain  an  online  solution,  a  probabilistic 
method  accounts  for  errors  and  compensates  in  an  attempt  to  minimize  them. 

Many  current  SLAM  algorithms  use  two  sensors  to  take  measurements.  The  hrst 
uses  encoders  on  the  motors  and  wheels  to  measure  its  position.  This  is  called  vehicle 
odometry,  and  due  to  its  nature,  it  produces  a  path  subject  to  compounding  error 
the  solution  seeks  to  reduce.  The  second  sensor  is  any  that  measures  ranges  from 
the  vehicle  to  objects  in  the  environment.  These  errors  do  not  compound  as  they  are 
relative  to  the  vehicle.  Because  of  this  disparity  in  measurement  accuracy,  the  most 
common  source  of  error  is  that  in  the  path. 

A  straightforward  SLAM  implementation  approaches  the  problem  by  measuring 
position,  then  applying  ranges  to  this  position.  In  this  approach,  path  errors  become 
map  errors  when  incorporating  range  observations  and  go  uncorrected.  This  strategy 
of  not  correcting  position  only  satishes  half  of  the  fundamental  SLAM  description  and 
is  called  a  loose  coupling  of  the  sensors.  For  accuracy,  the  SLAM  problem  demands 
more  tightly  coupled  sensors  in  a  solution. 
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1.3  Research  Goals 


Previous  researchers  have  already  developed  an  internal  navigation  system  using 
vision  and  inertial  data  without  mapping  [65].  One  arrangement  consists  of  stereo 
cameras  tightly  coupled  with  an  inertial  sensor  to  produce  an  accurate  position  on 
many  different  platforms  in  real-time  [23].  From  here  on,  it  is  called  the  Fletcher- 
Veth  Scale-Invariant  Feature  Transform  (FV-SIFT),  named  after  its  developers  and 
the  SIFT  feature  tracking  algorithm  [40]. 

It  uses  a  Kalman  hltering  method  to  achieve  a  functional  path.  Specihc  situations 
adversely  affect  the  system  due  to  its  dependence  on  good  images  and  smooth  move¬ 
ment  to  achieve  its  path  [64].  A  wheeled  ground  vehicle  can  mitigate  such  problems 
by  using  inputs  from  the  vehicle  itself,  but  this  previous  system  does  not  incorporate 
additional  input  sensors. 

The  goal  of  this  research  is  to  introduce  two  dimensional  mapping  capability  to 
FV-SIFT.  A  vehicle  carries  the  operating  navigation  hardware  and  a  novel  solution 
incorporates  the  inputs  from  stereo  cameras,  an  inertial  sensor,  and  vehicle  odometry 
into  a  single  path. 

This  work  requires  any  mobile  robotic  vehicle  that  has  the  required  computational 
and  sensory  equipment.  This  includes  all  FV-SIFT  hardware,  a  laser  range  hnder, 
and  wheeled  odometry.  The  indoor  hallway  environment  provides  real  test  data 
comparable  to  popular  data  sets  from  recent  research  [29]. 

1.4  Vision  Navigation  &;  Mapping 

This  research  combines  the  sensory  inputs  of  two  prominent  strategies  into  a  single 
solution.  It  makes  use  of  the  odometry  and  laser  ranges  of  the  vehicle  with  stereo 
images  and  inertial  measurements  of  the  FV-SIFT  hardware  into  a  two  dimensional 
mapping  algorithm.  The  solution  combines  motion  from  integrated  inertial  data. 
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extracted  from  stereo  camera  images,  and  measured  by  odometry  together  in  a  linear 
Kalman  filter  to  create  a  single  path.  A  particle  hlter  then  uses  that  path  with  laser 
ranges  to  produce  a  map  of  the  traveled  environment. 

This  research  builds  on  the  earlier  work  that  developed  FV-SIFT  with  the  addi¬ 
tion  of  mapping  functionality  to  improve  its  navigation  capabilities.  This  work  also 
supports  current  research  in  SLAM  by  providing  a  method  to  combine  inertial,  op¬ 
tical  (camera),  and  mechanical  (odometry)  sensors  into  a  single  solution  and  apply 
known  mapping  techniques  for  multiple  inputs. 

With  more  sensors  to  estimate  a  position,  this  research  produced  a  more  accurate 
path  with  92%  less  error  than  FV-SIFT  and  79%  less  error  than  mapping  with  vehicle 
odometry  in  conducted  tests.  The  paths  compare  to  a  building  floor  plan  since  the 
vehicle  operator  knows  the  approximate  path  taken.  The  operator  measures  the 
accuracy  of  a  path  by  determining  the  distance  from  known  locations.  Figure  2 
displays  a  floor  plan  of  the  hallway  environment  with  the  approximate  true  path 
taken.  Certain  points  have  been  surveyed  to  serve  as  the  reference  points  for  this  true 
path. 

Unfortunately,  the  operator  cannot  know  exact  truth  values  at  every  point,  as 
existing  external  systems  like  GPS  are  unavailable  in  the  testing  environment.  This 
serves  to  reinforce  the  purpose  of  the  research  and  illustrates  the  need  for  internal 
navigation. 

In  addition  to  an  accurate  path,  the  solution  also  constructs  a  map  of  the  environ¬ 
ment  traveled.  It  implements  a  particle  hltering  technique  to  further  adjust  the  path. 
The  algorithm  incorporates  laser  range  scans  to  observe  obstacles  such  as  walls  or 
doorways  and  place  them  in  the  map.  The  overall  strategy  maintains  many  different 
maps  that  represent  different  beliefs  of  the  environment.  The  particle  hlter  incorpo¬ 
rates  the  second  half  of  the  SLAM  problem  description  by  duplicating  maps  with  the 
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Figure  2.  A  floor  plan  of  the  building  environment  with  surveyed  locations.  The  desired 
path  of  an  accurate  solution  is  the  approximate  truth  looping  around  the  hallways. 

highest  chances  of  being  correct  and  removing  ones  inconsistent  with  measurements. 

1.5  Research  Applications  &;  Summary 

A  vehicle  equipped  with  this  system  is  capable  of  independently  exploring  an  un¬ 
known  location  and  reporting  on  that  environment.  It  can  return  from  an  exploration 
mission  having  produced  a  current,  accurate  map  of  the  area.  This  map  would  provide 
decision  makers  with  the  most  current  information  for  immediate  planning  decisions. 

Such  a  ground  vehicle  provides  new  search  and  rescue  capabilities  in  multiple 
environments.  The  implementation  presented  only  discusses  testing  indoors,  but  the 
same  solution  applies  in  any  relatively  flat  area.  Future  research  will  likely  extend 
SLAM  capabilities  to  three  dimensions,  building  an  even  more  complete  picture  of 
the  unknown  environment. 
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An  autonomous  ground  vehicle  would  be  able  to  explore  dangerous  locations  where 
the  safety  of  personnel  warrants  the  use  of  unmanned  vehicles.  It  allows  operators 
the  ability  to  remove  themselves  from  potentially  hazardous  situations  and  lowers  the 
risk  to  human  life. 

This  research  also  applies  to  environments  where  GPS  is  unavailable  either  tem¬ 
porarily  or  permanently.  Deploying  a  map-building  autonomous  vehicle  gives  its  op¬ 
erators  a  platform  that  does  not  rely  on  GPS  or  external  signals  to  explore  unknown 
areas  of  interest.  In  this  way,  a  map  can  serve  to  measure  distances  and  navigate 
without  requiring  external  communication. 

The  following  chapter  provides  a  background  on  work  related  to  this  topic  that 
has  been  researched  or  demonstrated  previously.  Ghapter  3  presents  details  of  the 
algorithms  and  implementations  used  in  the  research  to  produce  paths  and  maps  from 
sensor  inputs.  Ghapter  4  describes  the  specific  parameters  used,  data  collected,  results 
of  this  research,  and  comparison  to  alternate  methods.  The  hnal  chapter  comments 
on  what  can  be  done  to  further  this  research  area  with  potential  topics  to  explore. 
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II.  Background  &:  Related  Work 


This  work  builds  on  existing  algorithms  for  using  camera  images  and  other  sensors 
for  navigation  and  mapping.  In  addition  to  the  field  of  robotic  mapping,  this  includes 
the  fields  of  vehicle  navigation,  image  processing,  and  machine  vision.  Recent  research 
in  each  of  these  areas  has  resulted  in  significant  progress.  This  research  extends  this 
work  by  combining  these  research  areas. 

The  metholodology  presented  in  Chapter  3  uses  most  but  not  all  of  the  concepts, 
techniques,  and  strategies  of  this  chapter.  The  implementation  combines  odometry, 
a  camera  path,  and  inertial  measurements  in  a  linear  Kalman  filter  that  produces  the 
path  necessary  for  localization.  It  also  implements  a  particle  filter  that  uses  the  path 
and  a  range  sensor  to  build  a  grid  map  of  the  environment.  This  chapter  explains  the 
map  representation  and  the  particle  filter  separately. 

The  first  sections  of  this  chapter  describe  different  strategies  for  storing  the  map 
and  representing  position.  This  is  followed  by  foundations  in  localization  algorithms 
as  the  first  component  of  the  SLAM  problem.  Next,  this  chapter  discusses  mapping 
algorithms  and  more  recent  methods  to  improve  them.  Following  the  presentation 
of  machine  vision  algorithms  used  to  generate  and  track  image  features,  the  core  of 
image  navigation  solutions  are  discussed.  Finally,  this  chapter  covers  techniques  used 
to  construct  maps  from  inputs  gathered  by  camera  hardware. 

2.1  Representing  Maps 

A  common  decision  a  SLAM  solution  must  make  is  how  to  represent  the  environ¬ 
ment  internally.  The  map  itself  is  the  most  important  result,  as  this  serves  as  both  the 
most  visible  output  and  the  environment  for  later  navigation.  Although  this  chapter 
discusses  more  advanced  strategies  later,  there  are  two  basic  ways  to  represent  a  map 
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of  the  environment;  metric  and  topological  maps. 

One  method  to  represent  the  map  is  metric,  where  objects  are  stored  by  their 
absolute  position.  This  is  commonly  done  by  maintaining  an  occupancy  grid  of  which 
locations  are  open  and  which  contain  obstacles.  Grid  squares  may  have  centimeter 
resolution,  which  adds  up  when  map  sizes  approach  the  kilometer  range.  Storing  a 
global  metric  map  such  as  this  carries  a  signihcant  memory  requirement  which  can 
affect  computation  as  much  as  complexity,  so  several  algorithms  have  been  proposed 
to  reduce  this  requirement  [46]. 

Instead  of  the  common  metric  map,  another  strategy  is  the  topological  map  rep¬ 
resentation.  One  solution  takes  an  approach  similar  to  early  research  but  changes 
its  map  representation  [12].  This  means  that  landmarks  are  placed  based  on  their 
location  relative  to  others.  This  difference  in  maps  continues  to  be  a  divided  area, 
with  solutions  using  both  methods  of  representation. 

This  research  uses  an  occupancy  grid  metric  map,  as  displayed  in  several  successful 
implementations  [28,  29].  A  metric  map  is  easy  to  interpret  and  measure,  and  the 
grid  arrangement  allows  a  straightforward  implementation  on  a  computer  [33]. 

Even  with  a  fast  algorithm,  any  solution  must  represent  and  update  its  map, 
especially  if  one  is  maintained  for  each  particle  as  presented  in  this  research.  Because 
larger  maps  demand  more  particles  for  consistency,  this  memory  requirement  grows, 
and  is  most  often  the  most  signihcant  contribution  to  computation  time. 

2.2  Representing  Positions 

To  represent  vehicle  position,  both  a  location  and  an  orientation  are  needed.  A  two 
dimensional  coordinate  system  requires  three  variables,  whereas  a  three  dimensional 
system  requires  six  variables.  In  matrix  representation,  six  variables  means  a  6^  = 
36  element  matrix  multiplication,  so  keeping  the  dimension  low  reduces  the  data 
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requirements  to  3^  =  9  elements. 

A  two  dimensional  position  consists  of  an  x  and  a  y  coordinate.  Because  orienta¬ 
tion  is  desired,  the  vehicle  must  also  track  a  heading  angle  9.  Together,  these  three 
elements  make  up  a  vehicle  pose.  A  two  dimensional  pose  is  represented  by  vector  s 
in  Equation  1. 


Due  to  the  geometric  nature  of  the  pose  dehnition,  the  two  dimensional  trigonom¬ 
etry  of  the  pose  determines  their  addition  and  subtraction.  To  add  a  pose  s“  to 
another  s^,  the  result  is  not  straightforward  vector  addition  but  instead  dehned  by 
the  product  of  a  rotation  matrix  in  Equation  2. 


cos(6'“) 

—  sin(6*“) 

0 

s“  +  s'*  = 

ya 

+ 

sin(6*“) 

cos(6*“) 

0 

yb 

ga 

0 

0 

1 

9^ 

(2) 


For  notation  purposes,  pose  superscripts  transfer  to  its  elements.  In  the  same 
manner  as  addition,  subtraction  of  a  second  pose  from  an  initial  pose  s“  uses 
another  rotation  matrix  product  as  expressed  in  Equation  3. 


cos(6*^)  sin(6*^)  0 

Xa  _ 

s“  -  s'*  = 

—  sin(6*^)  cos(6*^)  0 

y<^  -yb 

0  0  1 

ga  _  gb 

(3) 


In  subsequent  equations,  all  pose  additions  and  subtractions  use  their  respective 
operation  from  Equations  2  and  3.  This  pose  dehnition  allows  for  easy  communica¬ 
tion  with  the  environment  as  represented  in  a  two  dimensional  grid.  This  research 


12 


leverages  this  useful  property  as  it  applies  to  the  occupancy  grid  for  mapping. 


2.3  Vehicle  Localization 

Vehicle  localization  focuses  on  determining  where  a  vehicle  is  within  a  dehned  area. 
It  is  often  difficult  for  humans  to  do  this  given  a  blueprint  of  a  building  or  a  map  of  a 
cave,  and  autonomous  vehicles  have  many  of  the  same  problems.  The  most  prevalent 
solutions  approach  this  problem  from  a  probabilistic  point  of  view  [15].  Probabilistic 
localization  algorithms  redehne  the  localization  solution  to  be  the  probability  that  a 
vehicle  occupies  any  given  location. 

To  localize  in  an  environment  at  time  t,  the  current  vehicle  pose  St  is  combined 
with  all  measurements  Zq-i  as  a  Bayesian  hltering  problem  [15].  The  hlter  represents 
an  estimate  of  the  posterior  probability  p{st\zo:t)  at  each  time  t.  With  each  new  time 
t,  it  updates  s*  with  a  motion  model  discussed  later  in  this  chapter,  then  an  update 
phase  hnds  the  posterior  by  the  Bayes  theorem: 


p{st\zo.,t)  = 


p{zt\st)  p{St\Zo:t-l 


(4) 


p{St\zo.,t-l) 

In  this  way,  the  sensor  measurements  are  incorporated  to  obtain  a  pose  estimate.  It 
is  important  to  note  that  the  above  calculation  is  only  possible  under  the  assumption 
that  Zt  is  conditionally  independent  of  all  previous  observations  zo:t-i  [52]. 

For  comparison,  consider  how  humans  navigate  a  building  with  a  map  but  without 
a  location  marked.  One  would  have  to  use  their  senses  to  narrow  down  a  location; 
if  eyesight  is  unavailable,  the  situation  is  the  same  for  the  blind.  Related  research 
details  how  the  same  concepts  discussed  for  vehicles  can  successfully  be  used  to  create 
a  solution  humans  can  use  for  themselves  [31]. 

For  these  strategies,  localization  requires  previous  knowledge  of  the  environment, 
such  as  the  floor  plan  of  a  building  or  map  of  a  park.  If  designing  a  vehicle  to  operate 
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in  another  location,  it  needs  a  map  for  the  new  environment  to  localize  to.  If  a  current 
map  of  the  building  or  area  exists,  this  would  not  be  possible,  because  there  is  no 
way  of  knowing  the  correctness  of  a  provided  map.  A  solution  can  only  assume  it  is 
accurate  and  result  in  a  false  solution  if  the  map  has  errors. 

2.4  Simultaneous  Localization  &;  Mapping 

Because  localization  suffers  from  any  inaccuracies  in  the  map  provided,  most  re¬ 
search  extends  the  problem  to  include  map  building  [57].  The  challenge  is  therefore 
to  perform  Concurrent  Mapping  and  Localization  (CML),  more  commonly  called  the 
Simultaneous  Localization  and  Mapping  (SLAM)  problem  [39]. 

Perhaps  the  most  difficult  aspect  of  SLAM  is  its  cyclical  solution.  In  order  for  a 
vehicle  to  build  a  map  of  its  surroundings,  an  accurate  position  is  required.  However, 
as  stated  in  the  last  section,  localization  is  only  possible  when  a  map  is  available  [60]. 
This  explains  why  solutions  have  only  been  presented  recently,  as  they  rely  on  prob¬ 
abilistic  methods  combined  with  accurate  hardware  and  fast  computation  [18].  This 
has  been  the  case  for  early  methods  and  more  recent  algorithms  alike  [58]. 

SLAM  solutions  iterate  by  hrst  estimating  a  pose  within  an  existing  map,  and 
then  updating  the  map  to  incorporate  new  measurements.  This  is  the  general  ap¬ 
proach  taken  after  an  early  solution  proved  that  a  SLAM  solution  both  exists  and 
converges  [16].  Its  estimation  approach  to  representing  vehicle  state  is  a  principle 
that  nearly  all  subsequent  methods  use. 

The  SLAM  posterior  at  time  t  is  similar  to  the  localization  posterior,  but  includes 
more  variables.  SLAM  requires  the  current  pose  Sj,  all  measurements  zo-t,  the  map 
0,  and  all  control  inputs  Ui,t.  Equation  5  shows  the  posterior  calculation  as  that  of 
a  Bayes  Liter,  where  r]  is  a.  normalization  constant. 
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p{St,  Q\Zo:t,  Ui:t)  =  V  p{Zt\St,  0) 


p(S4|S4_i,  Mi)p(St_i,  Q\Zo:t-l,  (5) 


The  posterior  typically  includes  a  term  for  data  association,  but  this  research 
removes  it  due  to  differences  in  map  representation.  The  data  association  problem 
requires  the  algorithm  to  correctly  map  each  observation  zt  to  each  landmark  6  E  Q 
for  an  accurate  solution.  Because  this  research  saves  0  as  a  metric  map  rather  than  a 
set  of  landmarks,  this  association  can  be  considered  part  of  the  measurement  model 
presented  later. 

Because  evaluating  the  integral  in  Equation  5  is  computationally  prohibitive, 
SLAM  solutions  use  statistical  estimating  techniques  to  approximate  the  hlter  [45]. 
A  common  approach  to  estimate  Equation  5  uses  an  Extended  Kalman  Filter  (EKE), 
which  linearizes  differences  in  vehicle  motion  to  estimate  a  current  pose  and  an  asso¬ 
ciated  error  [16].  The  primary  issue  with  an  EKE  algorithm  is  its  matrix  storage 
requirement  for  an  area  with  K  landmarks,  with  up  to  a  0{K^)  time  complexity  for 
matrix  operations. 

This  does  not  scale  well  in  areas  that  produce  large  maps  if  many  landmarks  are 
used.  Concerned  about  the  consistency  of  results,  a  newer  algorithm  was  introduced 
to  reduce  EKE  linearization  errors,  producing  a  slightly  more  accurate  estimate  [11]. 
Other  algorithms  attempt  to  further  reduce  this  error  by  using  an  UKF  to  produce 
better  results  [19]. 

Some  critical  examination  of  the  hlters  finds  limitations  to  the  linearization  as 
well  as  the  tendency  for  the  estimate  to  become  overly  optimistic  [3].  This  results 
in  less  error  than  needed,  resulting  in  an  increasingly  inaccurate  estimate.  This  flaw 
is  present  regardless  of  whether  an  EKE,  an  iterated  version  of  the  EKE,  or  a  UKF 
is  used.  Neither  improvement  makes  any  change  to  the  complexity  from  the  K  x  K 
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error  matrix  in  a  solution  with  K  landmarks. 

An  improved  EKF  SLAM  solution  uses  incremental  matrix  factorization  [36].  It 
seeks  to  keep  the  information  matrix  in  an  upper  triangular  form  and  the  useful  data 
together,  reducing  the  effective  complexity.  The  reduced  computation  allowed  for  a 
better  map  when  the  vehicle  returned  to  the  same  location  multiple  times,  closing 
the  loop  in  the  path. 

Some  EKF  based  SLAM  implementations  take  many  hours  to  compute  a  simple 
vehicle  path  on  modern  machines.  Simply  due  to  its  quadratic  running  time,  this 
removes  any  considerations  of  a  real-time  implementation.  For  this  reason,  the  EKF 
strategy  for  landmark  mapping  is  abandoned  in  this  research  for  a  different  approach 
using  a  particle  hlter. 

2.4.1  Particle  Filtering. 

To  take  a  different  approach  to  the  SLAM  problem,  recall  its  DBN  representation 
in  Figure  1.  Its  structure  allows  each  state  to  represent  all  information  at  the  current 
time  t,  and  each  observation  zt  and  control  Ut  to  be  conditionally  independent  [52]. 

Because  of  the  conditional  probabilities  of  the  DBN,  the  integral  in  Equation  5 
can  be  rewritten  as  the  product  shown  in  Equation  6  [52].  This  factors  the  SLAM 
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posterior  into  one  pose  posterior  and  K  landmark  estimates  [45]. 

K 

p{St,Q\Zo-,t,Ui-t)  =  p{St\Zt,Ut)Y\_P{(^k\St,  Z0:t,Ui:t)  (6) 

k=l 

A  different  approach  to  approximating  this  posterior  is  to  use  a  particle  hlter.  A 
particle  hlter  maintains  the  error  distribution  through  many  samples,  or  particles, 
each  storing  a  pose  estimate  and  landmark  locations.  This  is  in  contrast  to  the  large 
error  matrix  in  an  EKF.  Instead  of  maintaining  one  estimate  with  covariances,  many 
particles  each  keep  one  estimate,  creating  a  discrete  distribution  of  the  desired  error. 

Sampling  M  independent  particles  [m]  from  the  posterior  results  in  an  empir¬ 
ical  estimate  given  in  Equation  7,  where  S  is  the  Dirac  delta  function  at  location 
{gH,0M}  This  sampling  removes  the  conditional  probabilities  on  and  u, 

while  covering  the  desired  distribution  by  the  law  of  large  numbers. 


p{s,e\z,u) 


1 

M 


M 

m=l 


ds  d& 


(7) 


The  very  hrst  presentation  of  the  particle  hlter  notes  its  application  to  SLAM, 
performing  better  than  exact  inference  [17].  The  basic  strategy  to  include  many 
particles  [m]  works  well  initially,  but  each  time  error  is  added,  the  distribution  spreads 
further  apart  and  becomes  less  accurate  [61].  This  potential  complication  is  mitigated 
through  resampling,  the  key  aspect  of  this  hlter.  Resampling  ahects  the  distribution 
by  deleting  lowly  weighted  (high  error)  particles  and  copying  highly  weighted  (low 
error)  ones  in  a  similar  manner  to  genetic  algorithms. 

The  specihc  hlter  discussed  in  this  work  is  a  Rao-Blackwellized  Particle  Filter 
(RBPF),  named  after  the  factorization  method  of  marginalizing  out  variables.  Using 
this  factorization  with  resampling  reduces  the  number  of  particles  required  to  main¬ 
tain  the  desired  distribution.  The  RBPF  maintains  the  set  P  of  M  particles  to  be 
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distributed  across  the  location  posterior  [17].  Each  step  of  the  RBPF  is  listed  in 
Algorithm  1  for  each  time  t. 


Algorithm  1  Generic  RBPF  Operation 
for  all  times  t  do 

/ /  sequential  importance  sampling 
for  all  particles  [m]  E  P  do 

Sample  from  a  simple  distribution 
Include  in  the  set  of  previous  samples 
Calculate  and  normalize  the  importance  weight 
end  for 

/ /  selection  step 

Reproduce  with  high  and  suppress  with  low 
M  samples  are  now  distributed  along  the  posterior 
/ /  markov  chain  monte  carlo 
Apply  transition  probabilities  to  obtain  each 

end  for 


A  RBPF  first  performs  Sequential  Importance  Sampling  (SIS)  by  drawing  from  a 
Gaussian  proposal  distribution.  It  then  calculates  an  importance  weight  for  each 
particle  [m]  and  uses  a  Sampling  Importance  Resampling  (SIR)  algorithm  to  select  a 
new  set  of  samples  in  Figure  3  [45].  In  terms  of  the  SLAM  posterior,  is  the  ratio 
of  distributions  at  each  sample  location  as  shown  in  Equation  8. 

[m]  target  distribution  p{s^l^^\zt,Ut) 

-  7  T~-  ^  ^  -  T  i 

proposal  distribution  p(^s^  \zt-i,Ut) 

Researchers  soon  applied  the  RBPF  to  SLAM  seeking  to  reduce  the  computational 
complexity  of  the  EKF  sensor  update  [46] .  This  method  is  appropriately  named  Fast- 
SLAM  and  achieves  a  running  time  of  0{M  log  K)  for  M  particles  and  K  landmarks. 
This  makes  it  able  to  handle  much  larger  environments  and  still  be  feasibly  compu¬ 
tational.  The  FastSLAM  algorithm  completes  three  parts  for  each  particle  in  each 
time  step  t  [45].  This  process  is  shown  with  resampling  in  Algorithm  2. 

To  map  larger  environments,  the  strategy  simply  increases  the  number  of  particles 
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Samples  from 
proposal  distribution 


Weighted  stimples 


Figure  3.  An  example  of  sampling  and  weighting.  The  algorithm  draws  from  a  proposal 
Gaussian  distribution  (dashed  line),  then  weights  each  sample  to  achieve  the  target 
posterior  distribution  (solid  line)  [45]. 


Algorithm  2  Basic  FastSLAM  Procedure 
for  all  times  t  do 

for  all  particles  [m]  G  P  do 

Sample  new  pose  si™'  given  control  Ut  / /  motion  model 
Update  landmarks  corresponding  to  observation  Zt  j  j  measurement  model 
Assign  weight  to  each  [m]  / /  weighting 
end  for 

Resample  P  according  to  each  weight 

end  for 


in  the  RBPF  to  produce  a  consistently  accurate  map.  FastSLAM  is  not  without  fault, 
as  a  large  number  of  particles  slows  computation  time.  Some  analysis  further  explains 
that  increasing  the  map  size  will  eventually  cause  any  number  of  particles  to  become 
inconsistent  [4]. 

FastSLAM  therefore  contains  a  balance  between  speed  and  quality,  dependent  on 
how  many  particles  are  used.  Although  an  improved  version  incorporates  a  smaller 
particle  distribution  closer  to  the  measurement,  it  still  must  determine  a  required 
number  of  particles  [47].  Even  as  they  show  the  second  version  converges  with  a 
single  particle,  note  that  many  particles  are  still  required  for  a  consistent  practical 
implementation.  Its  quality  also  depends  greatly  on  the  accuracy  of  the  sensors  and 


models  within  it.  Regardless,  FastSLAM  remains  a  scalable  implementation. 

One  problem  with  RBPF  solutions  is  that  a  large  enough  path  results  in  mea¬ 
surement  errors  outside  the  particle  distribution.  This  results  in  a  map  unable  to 
close  large  loops  and  perhaps  intersecting  its  path  at  an  incorrect  location.  Other 
algorithms  have  been  presented  to  combat  this  effect,  one  method  using  distributed 
particles  within  a  different  filter  [20] .  It  has  a  complicated  running  time  and  requires 
a  comparable  number  of  particles,  yet  produces  a  robust  map.  A  later  publication 
improves  the  algorithm  complexity  to  a  level  not  greater  than  a  localization  solu¬ 
tion  [21]. 

Another  SLAM  solution,  GraphSLAM,  seeks  to  use  past  measurement  and  path 
data  to  update  the  map,  instead  of  just  the  most  recent  [62].  By  representing  the 
posterior  as  a  graph  of  nodes,  this  algorithm  performs  better  in  large-scale  environ¬ 
ments.  However,  because  it  uses  previous  data,  it  is  a  full  SLAM  solution  where  a 
map  can  only  be  computed  after  all  measurements  have  been  taken.  This  is  useful  for 
building  a  reference  map  at  a  later  time,  but  cannot  compute  a  map  as  the  vehicle 
explores  the  environment. 

The  authoritative  book  on  the  subject  reviews  both  versions  of  FastSLAM  and 
other  popular  methods  in  great  detail  [61].  Its  authors  compare  each  method  us¬ 
ing  various  data  sets  and  present  their  results,  finding  that  all  SLAM  methods  pro¬ 
vide  different  advantages  but  none  have  demonstrated  themselves  to  be  better  than 
FastSLAM  in  all  situations.  As  such,  FastSLAM  provides  the  foundation  for  the 
implementation  used  in  this  research. 

2.4.2  Filter  Models. 

As  described,  FastSLAM  is  not  a  single  algorithm;  rather,  it  is  a  framework  for 
solving  the  SLAM  problem.  It  specifies  each  of  its  steps  in  terms  of  input  and  ef- 
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feet  and  than  the  implementation  details.  These  steps  are  generally  separate,  and 
replaceable  with  different  system  models  [61]. 

Because  the  resampling  process  is  dictated  by  the  RBPF,  FastSLAM  requires  two 
models.  The  first  is  the  motion  model,  to  incorporate  a  probabilistic  pose  distribution 
using  control  input  Ut,  and  the  second  is  the  measurement  model,  to  update  the  map 
and  weight  this  distribution  using  observation  Zf. 


2.4.2. 1  Motion  Model. 


Of  particular  note  is  the  development  of  a  motion  model  for  vehicle  odometry  [61]. 
This  model  bases  vehicle  motion  upon  the  mechanics  of  a  wheeled  ground  vehicle  as 
is  commonly  used  for  mapping.  Odometry  records  poses  directly  at  a  measurement 
interval,  denoted  by  different  times  t.  During  this  time,  the  vehicle  moves  to  its 
current  pose  St  from  previous  pose  St_i.  Equation  9  depicts  the  difference  between 
poses  (5s  of  interest  in  the  motion  model. 


5s  —  St  —  St-i  — 


5x 

5y 

56 


(9) 


The  purpose  of  the  motion  model  is  to  represent  this  movement  from  time  t  —  1 
to  t  and  record  an  updated  location  and  heading.  Figure  4  shows  an  example  of  pose 
difference  5s  graphically. 

The  motion  model  describes  (5s  as  a  rotation  from  the  old  pose  to  the  new  one,  a 
translation  toward  it,  and  second  rotation  to  the  new  heading  [61].  This  allows  the 
model  to  add  Gaussian  noise  to  the  rotation  and  translation  components  rather  than 
a  multidimensional  oval.  Equation  10  dehne  how  the  three  motion  components  are 
calculated  from  5s. 
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Figure  4.  The  odometry  motion  model  expresses  movement  as  rotation  6roti  toward  a 
new  pose,  translation  Strans  in  that  direction,  and  rotation  Srot2  to  a  new  heading  [61]. 
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Since  these  parameters  are  treated  as  a  measurement,  they  remain  the  same  for 
each  particle.  To  represent  the  distribution,  four  parameters  a  represent  the  expected 
error  between  various  movement.  Each  particle  then  uses  each  a  value  to  calculate 
the  error  applied  to  its  new  pose.  This  is  shown  in  Equation  11,  where  J\f  samples  a 
zero-mean  normalized  Gaussian  distribution  and  multiplication  is  element-wise. 
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Each  particle  [m]  then  adds  the  sampled  difference  to  its  previous  pose  to 
obtain  its  new  in  Equation  12.  This  results  in  particle  set  P  being  distributed 
to  represent  the  desired  pose  error.  The  goal  of  the  particle  hlter  is  that  one  particle 
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retains  the  correct  vehicle  path. 
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All  together,  the  motion  model  propagates  vehicle  pose  separately  for  each  par¬ 
ticle  [m]  using  the  same  odometry  measurement  (5s.  The  complete  motion  model  is 
described  in  Algorithm  3,  where  odometry  measures  the  most  recent  pose  difference 
(5s  and  current  heading  9. 


Algorithm  3  Odometry  Motion  Model 
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end  for 


The  four  a  values  exist  to  vary  the  size  and  shape  of  this  distribution  to  model 
the  anticipated  odometry  error.  A  researcher  sets  the  a  parameters  to  adjust  this 
variance.  A  different  vehicle  often  demands  different  a  parameters  resulting  in  a 
different  distribution  [61]. 

The  distribution  must  be  large  enough  to  cover  the  actual  measurement  error  and 
small  enough  to  require  few  particles.  In  practice,  this  distribution  is  often  bean¬ 
shaped  to  represent  the  error.  This  reflects  the  model  being  more  conhdent  in  Strans 
and  less  in  Sroti  or  Srot2- 

Regardless  of  its  shape,  the  odometry  motion  model  distribution  grows  with  each 
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time  t.  As  the  vehicle  travels  a  certain  distance,  the  pose  uncertainty  grows  and 
the  distribution  models  this  accordingly.  Figure  5  displays  the  effect  of  this  growing 
distribution  on  a  simulated  path  that  travels  a  fixed  distance  forward  and  makes  two 
left  turns. 


Figure  5.  An  example  path  showing  a  particle  distribution  spread  using  the  odometry 
motion  model.  The  vehicle  begins  at  the  arrow  and  travels  along  the  path  [61]. 

This  motion  model  continues  unchanged  in  this  research,  as  described  in  the  follow¬ 
ing  chapter.  Its  a  parameters  allow  the  same  process  to  create  a  plethora  of  different 
distributions  depending  on  the  specihc  odometry  and  environmental  settings.  This 
motion  model  does  not  describe  the  inertial  or  image  inputs  by  themselves  well,  but 
remains  effective  in  creating  a  desired  distribution  when  combined  with  odometry. 

2. 4. 2. 2  Measurement  Model. 

The  motion  model  only  considers  control  input  Ut  to  propagate  the  particles,  and 
does  not  differentiate  between  accurate  and  inaccurate  ones.  To  do  this,  FastSLAM 
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needs  a  measurement  model  to  incorporate  observation  Zt-  Regardless  of  the  specific 
details,  the  measurement  model  must  stratify  particle  set  P  by  incorporating  current 
map  knowledge,  usually  through  zt  [61]. 

The  measurement  model  distinguishes  each  particle  [m]  by  computing  an  error 
value  for  each  one.  FastSLAM  uses  this  error  to  weight  particle  set  P  for  resampling. 
Thus,  the  measurement  model  is  the  critical  process  to  transition  P  from  the  prior 
to  the  posterior  pose  distribution  [28] .  This  implementation  uses  a  range  scan  model 
that  observes  each  grid  cell  along  each  scan  line.  If  a  cell  of  map  ©I’”]  is  not  consistent 
with  current  scan  Zt,  the  model  accumulates  error  for  particle  [m]  [33].  It  builds  up 
error  in  this  manner  for  each  [m]  at  each  time  t  to  ensure  the  pose  distribution  of  P 
maintains  consistency  with  observation  Zt- 

Measurement  models  are  less  direct  than  the  odometry  motion  model,  reflecting 
the  vast  differences  in  observation  sensors  [61].  This  research  uses  a  range  scan  mea¬ 
surement  model  that  builds  error  by  occupancy  grid  cells  [33].  However,  a  scanning 
technique  that  uses  different  sensor  types,  amounts,  or  placement  must  change  its 
measurement  model  to  factor  this  new  physical  arrangement.  Another  difference  is 
in  the  map  representation,  where  changing  the  occupancy  grid  also  requires  a  dif¬ 
ferent  measurement  model  to  observe  map  0.  Details  of  the  measurement  model 
implemented  in  this  research  is  presented  in  Chapter  3. 

2.4.3  Alternative  Map  Representation. 

While  computational  complexity  between  EKF  based  SLAM  and  FastSLAM  is  a 
common  issue,  the  most  signihcant  contribution  to  running  time  is  often  associated 
with  storing  the  map  [20] .  Especially  when  many  particles  each  store  their  own  copy, 
a  large  environment  can  require  gigabytes  of  memory.  Much  research  focuses  on 
reducing  this  cost  by  making  use  of  different  map  representation  methods  [6,  8,  24,  25]. 
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An  algorithm  for  dividing  the  global  map  into  smaller  regions  allows  those  regions 
to  be  represented  in  a  tree  structure  [25] .  This  reduces  the  number  of  landmarks  to 
calculate  at  each  step,  and  only  consists  of  several  more  calculations  to  move  between 
regions.  The  primary  focus  of  this  strategy  is  to  close  loops  in  the  vehicle  path.  This 
means  that  returning  to  a  previous  location  is  successfully  mapped  to  the  same  spot. 

To  prove  the  effect  of  this  storage  implementation,  this  strategy  is  used  on  an 
enormous  data  set  of  one  million  landmarks  [24].  Such  a  data  set  is  simply  too  large 
for  an  EKF  to  hnish  calculating,  especially  considering  that  memory  was  still  the 
primary  contributor  in  the  improved  implementation  presented.  The  true  map  and 
constructed  map  of  these  tests  are  shown  in  Figure  6. 


(a)  (b)  (c)  (d) 

Figure  6.  Maps  from  a  million  landmark  data  set.  (a)  is  the  true  map  of  four  buildings 
explored,  (b)  is  a  closeup  of  one  building,  (c)  is  the  map  estimate  before  closing  the 
loop,  and  (d)  is  the  final  map  estimate  [24]. 

While  the  map  generated  from  the  algorithm  in  (d)  does  not  completely  replicate 
truth  (a),  it  is  able  to  improve  itself  from  (c)  by  using  landmark  information.  Fven  by 
visual  inspection,  (d)  appears  good  enough  to  navigate  with  and  succeeds  in  closing 
the  loops  to  previous  locations. 

Most  implementations  store  a  single  map  type,  but  others  maintain  several  local 
metric  maps  roughly  the  size  of  the  sensor  range.  These  are  combined  by  maintaining 
an  overall  topological  map  and  placing  the  local  metric  maps  within  it  [8].  This  results 
in  less  of  a  storage  requirement  and  helps  to  keep  local  areas  correctly  placed  relative 
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to  each  other.  Such  a  hybrid  approach  can  also  be  used  to  separate  the  map  into 
a  hierarchy,  updating  one  level  at  a  time  to  produce  one  metric  map  at  the  end  [6]. 
Figure  7  depicts  the  many  steps  involved  in  this  process. 
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Figure  7.  Maps  built  from  a  hybrid  sequential  process.  The  algorithm  stores  local 
metric  maps  (a)  while  building  an  overall  topological  map  (b).  It  then  finds  offsets 
between  poses  (c)  and  places  local  maps  into  a  global  one  (d).  Finally,  it  corrects  the 
vehicle  path  (e)  to  produce  a  final  map  (f)  [6]. 


Such  map  representations  are  effective  at  computing  large  data  sets  and  closing 
loops.  Landmark  storage  is  not  the  immediate  focus  of  this  work,  which  seeks  to 
simplify  the  mapping  process,  not  complicate  it.  This  research  holds  both  of  these 
strategies  as  potential  strategies  to  be  applied  later. 
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2.4.4  Comparing  Results. 


Due  to  the  number  of  different  outputs  obtained  from  algorithms  with  different 
storage  mechanisms,  it  is  often  difficult  to  compare  them.  Different  algorithms  use 
the  same  range  sensors  to  interpret  objects  as  occupied  cells  or  separate  landmarks. 
In  the  resulting  maps,  a  metric  occupancy  grid  does  not  resemble  a  topological  map 
and  both  are  different  from  a  scatter  plot  of  landmarks.  Visual  inspection  of  different 
representations  can  distinguish  what  looks  like  a  believable  map  from  one  full  of 
errors,  but  cannot  distinguish  between  more  similar  results  and  is  subject  to  human 
interpretation. 

One  paper  presents  a  more  precise  method  to  compare  the  results  of  several  dif¬ 
ferent  solutions  [10].  This  method  considers  vehicle  path,  one  pose  after  another,  to 
determine  which  output  is  closer  to  ground-truth.  Not  only  is  this  a  more  quantitative 
measure  than  visual  inspection,  but  it  is  more  effective  as  well. 

Because  most  algorithms  first  estimate  each  pose  then  build  the  map,  comparing 
the  poses  of  each  algorithm  results  in  a  more  signihcant  difference.  This  work  does  not 
implement  the  specihc  method  outlined,  but  instead  uses  a  similar  path  comparison 
rather  than  simply  discussing  the  differences  in  produced  maps. 

2.4.5  Advanced  Techniques. 

One  signihcant  factor  that  affects  SLAM  research  signihcantly  is  the  available 
hardware.  Many  solutions  make  use  of  a  range  hnder  to  detect  obstacles,  either 
through  RADAR,  SONAR,  or  Light  Detection  and  Ranging  (LIDAR).  While  often 
expensive,  these  sensors  are  also  incredibly  accurate  with  LIDAR  units  maintaining 
millimeter  to  centimeter  error  resolution  at  distances  up  to  tens  of  meters.  A  solution 
can  detect  distinct  differences  in  adjacent  readings  to  extract  landmarks  as  done  in 
early  SLAM  solutions.  Alternatively,  it  can  apply  each  scan  to  a  metric  grid  map  as 


done  in  this  research. 


The  advantage  of  using  such  an  accurate  sensor  conies  from  the  small  LIDAR 
error  relative  to  the  sensor.  This  is  much  different  from  the  error  of  a  pose  estimate 
obtained  through  vehicle  odometry  or  velocity  measurements.  Movement  error  from 
odometry  or  inertial  measurements  is  not  only  much  larger  than  the  range  errors  but 
is  also  cumulative,  often  compounding  quickly.  In  nearly  all  tests,  the  sensor  ranges 
have  much  less  error  than  vehicle  measurements  meaning  the  pose  estimates  cause 
map  inconsistencies,  not  the  sensor  ranges. 

To  leverage  this  fact  is  the  strategy  of  scan  matching.  By  matching  the  proposal 
distribution  to  scan  points,  the  solution  incorporates  range  measurements  into  the 
pose  estimate  before  the  map  is  built.  Scan  matching  is  an  adaptive  technique  that 
computes  a  more  accurate  proposal  distribution  by  considering  range  inputs  in  the 
motion  model  to  reduce  pose  uncertainty  [28]. 

Presented  scan  matching  research  has  improved  path  accuracy  and  reduced  the 
pose  distribution  to  require  less  particles  [30].  This  method  succeeds  in  complicated 
environments  using  three  dimensional  scanners.  The  additional  dimension  generates 
a  signihcant  amount  of  data,  but  the  principle  applies  to  smaller  and  two  dimensional 
environments  as  well  [9]. 

The  effect  of  scan  matching  is  best  illustrated  by  example  situations.  Around 
environment  obstacles,  situations  arise  where  the  algorithm  incorporates  the  accurate 
ranges  to  narrow  the  pose  distribution  [28].  Less  variance  benehts  the  motion  model 
by  reducing  pose  error  and  allowing  for  fewer  particles.  Three  map  areas  are  pictured 
in  Figure  8,  each  producing  a  different  effect  on  the  pose  distribution. 

By  implementing  scan  matching  improvements  to  a  RBPF  algorithm,  researchers 
could  reduce  the  number  of  particles  required,  cutting  memory  requirements  and 
computation  time  [29].  Using  the  principle  of  scan  matching,  the  most  recent  mea- 
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(a) 


(b) 


(c) 


Figure  8.  Mapping  scenarios.  In  an  open  corridor  (a),  ranges  detect  walls  on  either 
side  and  shape  the  pose  distribution.  In  a  corner  (b),  more  walls  mean  a  more  accurate 
estimate.  Without  walls  (c),  the  motion  model  distribution  is  much  larger  [28]. 


surement  is  considered  in  addition  to  vehicle  movement,  resulting  in  a  particle  distri¬ 
bution  with  less  variance.  The  RBPF  then  preceeds  with  weighting  particle  set  P  by 
comparing  the  current  scan  to  each  map. 

As  this  work  uses  an  accurate  LIDAR,  scan  matching  is  certainly  applicable. 
However,  scan  matching  is  used  to  decrease  the  particle  distribution  as  an  added 
improvement.  It  is  not  a  necessary  aspect  of  FastSLAM,  so  this  research  completes 
a  functional  SLAM  solution  before  considering  a  scan  matching  application  to  the 
results. 

Used  in  combination  with  scan  matching,  this  research  implements  an  adaptive 
resampling  process  to  reduce  the  risk  of  removing  accurate  particles  [29].  The  re¬ 
sampling  adapts  by  only  resampling  P  once  the  weighting  indicates  a  need,  rather 
than  at  every  time  or  after  a  fixed  interval.  This  strategy  is  also  implemented  in  this 
research,  as  both  the  RBPF  and  metric  map  techniques  are  similar. 

Other  algorithms  correct  the  vehicle  path  separately  from  the  map  in  an  iterative 
process  [50].  This  improves  the  path  of  an  otherwise  incorrectly  aligned  map,  but 
can  only  be  calculated  after  the  path  is  calculated,  making  it  an  offline  solution.  This 
work  also  dismisses  this  iterative  strategy,  as  it  focuses  on  an  online  implementation 
only. 
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There  are  many  different  SLAM  algorithms,  each  with  certain  advantages  in  spe- 
cihc  scenarios.  The  area  of  concern  in  this  work  is  indoor,  two  dimensional  environ¬ 
ments.  This  simplihes  the  implementation  of  the  algorithm  and  decreases  complexity 
of  the  sensors.  The  result  is  less  overall  computation  and  memory  requirement,  al¬ 
lowing  more  tests  to  be  conducted. 

2.5  Machine  Vision  Localization  Mapping 

Machine  vision  is  a  critical  component  of  artihcial  intelligence  and  robotics  re¬ 
search  for  same  reason  eyesight  is  a  critical  component  of  biological  life.  Image 
processing  techniques  can  be  applied  immediately  to  accomplish  navigation  and  po¬ 
sitioning  tasks  just  as  humans  and  animals  detect  objects  and  movement  through 
our  vision.  This  section  presents  some  recent  research  in  tracking  image  features  and 
using  them  for  navigation  and  mapping. 

This  section  discusses  the  tracking  algorithms  that  provide  a  basis  for  FV-SIFT 
and  cover  its  operation.  It  also  presents  recent  research  efforts  in  mapping  using 
cameras,  most  of  which  is  currently  unavailable.  The  details  of  the  image  processing 
techniques  in  this  work  are  discussed  in  the  next  chapter. 

In  this  document,  the  word  “feature”  specihcally  refers  to  a  spot  of  an  image, 
whereas  “landmark”  is  an  important  navigation  point.  An  algorithm  recognizes  a 
physical  object  as  a  feature  before  interpreting  it  as  a  landmark  in  later  steps. 

2.5.1  Feature  Tracking. 

To  save  only  relevant  information,  machine  vision  systems  often  use  the  concept 
of  feature  extraction,  the  process  of  hnding  points  of  interest  in  an  image,  called 
features  [54].  Feature  tracking  hnds  the  set  of  features  in  one  image  and  the 
set  of  features  Fb  in  another  image.  It  then  matches  a  subset  of  Fa  fl  Fb  to  those 
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features  it  finds  between  images.  This  process  implies  several  difficult  steps,  such 
as  recognizing  what  features  were  previously  identified  and  tracking  their  movement 
between  images. 

Determining  how  well  an  algorithm  recognizes  features  is  difficult,  as  features  can 
change  in  any  combination  of  position,  rotation,  or  size  from  one  image  to  another  [40]. 
In  addition,  features  appear  different  under  changing  lighting  conditions  and  from 
different  perspectives. 

The  feature  tracking  problem  is  the  first  step  in  using  vision  for  an  application  such 
as  navigation.  When  computation  hardware  became  available,  researchers  presented 
a  process  to  determine  which  features  to  find  and  track  [54].  The  critical  method 
is  to  select  a  limited  number  of  features  based  on  how  well  they  will  be  tracked  in 
subsequent  images. 

Subsequent  methods  in  image  feature  extraction  were  shown  to  be  more  robust 
to  change.  One  extraction  method  named  the  Scale-Invariant  Feature  Transform 
(SIFT)  produces  features  that  are  more  distinctive  and  less  likely  to  be  mistaken  for 
others  [40].  It  focuses  on  the  property  of  scale  invariance,  or  how  well  the  points 
being  tracked  are  recognized  as  their  scale  changes.  These  good  features  are  essential 
to  the  performance  of  the  system,  whose  goal  is  to  correctly  label  as  many  as  possible 
in  the  next  image.  Although  first  shown  to  be  effective  for  static  object  recognition, 
it  has  also  been  used  for  image  navigation  applications  like  SLAM  [53]. 

SIFT  produces  accurate  results,  although  subsequent  research  presents  two  suc¬ 
cessful  alternatives,  both  aiming  to  reduce  the  required  computation  time  when 
considering  real-time  applications.  One  method  is  Speeded-Up  Robust  Features 
(SURF)  [5].  It  works  by  approximating  the  features  detection  and  description,  de¬ 
creasing  the  number  of  calculations  and  time  required.  Another  method  uses  a  differ¬ 
ent  feature  description  by  calculating  differences  in  pixel  bins,  and  is  called  Histogram 
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of  Oriented  Gradients  (HOG)  [14],  Even  though  HOG  is  introduced  for  human  de¬ 
tection,  it  achieves  similar  results  as  SURF  for  feature  extraction  and  also  requires 
much  less  time  than  SIFT. 

The  applicable  navigation  implementation  was  designed  for  SIFT  to  track  features 
for  use  in  navigation  [65].  This  work  describes  using  a  hlter  discussed  in  a  later  section 
to  estimate  a  vehicle  path.  The  algorithm  has  been  used  with  both  SIFT  features  and 
more  recently  faster  SURF  features,  which  provides  the  basis  for  the  vision  system 
used  in  this  research.  The  implementation  presented  in  the  following  chapter  contains 
many  of  the  same  hardware  components. 

2.5.2  Image  Feature  Egomotion. 

Many  SLAM  implementations  have  presented  results  using  odometry  and  LIDAR, 
but  others  present  the  use  of  other  input  sensors.  Some  use  vision  cameras  for  hard¬ 
ware  and  algorithms  to  extract  movement.  The  process  of  using  sequential  images 
for  navigation  is  called  egomotion,  as  used  extensively  for  rover  navigation  when 
paired  with  odometry  [48,  49].  Two  of  the  Mars  Exploration  Rovers  (MER)  utilize 
this  strategy,  carrying  stereo  cameras  to  estimate  an  alternate  path  when  traversing 
rough  terrain  where  odometry  is  less  accurate  [41]. 

Perhaps  the  first  published  article  using  only  vision  features  as  input  for  the 
SLAM  problem  utilized  SIFT  and  was  presented  at  a  similar  time  [53] .  Alternative  to 
hltering  techniques,  an  iterative  algorithm  presents  a  reasonable  method  to  generate 
a  path  [44].  Until  recently,  however,  the  application  of  these  methods  to  SLAM  has 
not  been  extensively  explored. 

A  motivation  for  alternative  inputs  emerges  as  solutions  become  more  accurate 
and  computation  speeds  increase.  This  allows  more  experiments  to  be  done  using 
existing  techniques  with  different  sensors.  For  example,  a  RBPF  SLAM  solution 
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using  visual  features  as  input  can  result  in  a  more  accurate  map  than  pose  estimates 
from  odometry  measurements  [26,  55,  56].  These  papers  explain  cost  as  a  signihcant 
factor,  as  highly-accurate  LIDAR  units  are  often  several  thousands  of  dollars.  While 
some  can  afford  a  limited  number  of  such  sensors,  the  application  of  SLAM  capabilities 
often  demands  a  cheaper  solution. 

Another  consideration  is  the  size  and  practicality  of  large  sensor  equipment. 
Money  is  not  the  only  design  cost,  as  LIDAR  units  often  require  more  power  than 
motors  and  computers  and  contribute  signihcantly  to  weight.  Machine  vision  may 
require  more  computation,  but  for  a  smaller  application  it  also  requires  much  less 
hardware.  Cameras  used  have  been  as  small  as  two  lenses  mounted  onto  a  moni¬ 
tor,  which  produces  clear  enough  images  to  create  a  map  similar  to  odometry  based 
poses  [51,  22].  Other  vehicles  can  be  built  with  complicated  rigs  of  several  cameras 
taking  a  full  panoramic  view  of  the  surroundings  [35].  This  allows  for  features  to  be 
tracked  from  one  camera  to  another  as  the  vehicle  turns  or  revisits  a  location  at  a 
different  orientation.  This  implementation  results  in  a  map  with  more  obstacles  de- 
hned,  but  requiring  complicated  software  and  signihcant  memory  to  operate  so  many 
cameras  at  once. 

The  details  of  egomotion  algorithms  are  often  not  provided  in  the  presented  re¬ 
search.  The  basic  process  relies  on  epipolar  geometry  to  calculate  focal  points  from 
stereo  cameras  [1].  The  algorithms  match  image  features  between  cameras  and  be¬ 
tween  time  steps,  then  translate  their  movement  into  sensor  motion.  All  egomotion 
calculations  are  discussed  in  the  Methodology. 

2.5.3  Additional  Approaches  &;  Applications. 

Most  research  has  focused  on  indoor  or  urban  environments,  seeking  to  create  a 
hoor  plan  or  street  layout.  However,  the  SLAM  problem  applies  outdoors  just  as  easily 
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with  many  of  the  same  solutions.  One  stereo  vision  SLAM  solution  uses  an  EKF  to 
map  underwater  [59].  The  presented  system  is  unique  as  it  maps  in  three  dimensions 
and  calculates  in  real-time;  most  solutions  ignore  these  otherwise  applicable  aspects. 

A  ground  example  presents  a  method  of  map  storage  specihcally  for  environments 
where  landmarks  and  obstacles  have  different  heights  [42] .  This  is  not  as  applicable  in 
hallways,  where  flat  walls  are  restricted  by  a  ceiling,  but  does  signihcantly  distinguish 
rocks  from  trees  from  buildings.  Another  difference  in  this  research  from  others  is 
that  it  uses  a  metric  grid  for  map  storage  as  many  LIDAR  solutions  hnd  it  easy  to 
do.  This  is  in  comparison  to  storing  landmark  locations  as  is  otherwise  done  in  vision 
solutions.  This  corresponds  more  closely  to  common  RBPF  solutions,  implying  their 
integration  is  possible. 

A  different  approach  beginning  to  emerge  is  that  of  machine  learning.  One  could 
consider  every  SLAM  solution  to  be  an  agent  that  learns  the  map  with  each  subse¬ 
quent  measurement,  but  the  presented  solutions  do  not  build  upon  this.  Approaching 
SLAM  from  a  machine  learning  perspective  yields  a  solution  that  requires  much  less 
vision  hardware  and  successfully  navigates  [32].  This  implementation  makes  use  of 
only  a  single  camera,  which  observes  the  floor  as  opposed  to  walls.  Its  primary  suc¬ 
cess  was  the  determination  that  cameras  are  subject  to  blur  effects  not  seen  by  other 
measurements,  and  mitigating  this  effect  results  in  more  accurate  image  data. 

One  approach  using  symmetric  regions  based  on  the  HOG  descriptor  suggests  that 
feature  extraction  methods  are  an  effective  option  in  further  research  [38].  Other 
recent  research  uses  inertial  data  combined  with  egomotion  to  show  that  a  calibrated 
system  can  perform  in  multiple  environments  with  minimal  drift  [34].  In  both  cases, 
the  results  show  that  camera  based  methods  are  promising  sensors  for  SLAM  solutions 
where  maintaining  an  accurate  path  and  landmark  locations  are  critical. 

SLAM  is  still  a  relatively  recent  problem,  and  there  is  certainly  more  work  that 
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will  be  devoted  to  algorithms  making  use  of  different  combinations  of  new  sensor 
technologies  and  solution  strategies.  As  more  environments  are  explored  and  more 
applications  are  considered,  additional  algorithms  work  to  provide  them  with  func¬ 
tional  solutions.  It  is  likely  that  only  few  of  the  potential  SLAM  solutions  have  been 
presented,  and  there  is  a  signihcant  amount  of  robotic  mapping  yet  to  be  done. 
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III.  Methodology 


This  chapter  presents  a  modified  FastSLAM  implementation  that  incorporates  a 
novel  combination  of  multiple  input  sensors.  Where  previous  research  used  separate 
sensors  to  create  separate  paths,  this  work  combines  stereo  image  egomotion,  inte¬ 
grated  inertial  measurements,  and  odometry  to  create  a  single  path  using  a  linear 
Kalman  filter.  This  path  combination  will  be  referred  to  as  the  Multiple  Integrated 
Navigation  Sensors  (MINS)  system. 

The  MINS  system  sends  its  path  to  a  FastSLAM  implementation  as  its  input 
path.  FastSLAM  uses  this  state  to  propagates  its  particles,  then  uses  LIDAR  ranges 
to  measure  error  and  generate  occupancy  grid  maps.  Lastly,  it  weights  its  particles 
based  on  the  error  and  resamples  them  if  necessary. 

This  chapter  first  begins  with  an  overview  of  the  novel  MINS  system  as  extended 
from  principals  in  Chapter  1.  The  following  sections  cover  the  process  of  extracting 
motion  from  a  set  of  stereo  images,  and  a  description  of  inertial  data  integration. 
Next  is  a  description  of  how  MINS  filters  pose  estimates  from  egomotion,  inertial 
movement,  and  odometry  together.  Following  this,  this  chapter  describes  the  process 
for  integrating  the  combined  poses  into  a  RBPF  implementation. 

3.1  Overview 

The  inputs  and  outputs  of  a  SLAM  solution  conform  to  a  DBN  on  a  fundamental 
level.  Because  this  research  stores  the  environment  in  a  single  map  rather  than  a  set  of 
landmarks,  it  demands  a  different  DBN  representation  than  the  one  discussed  in  the 
first  two  chapters.  Figure  9  depicts  a  DBN  representing  map  0  as  a  single  hidden  node 
rather  than  several  landmarks.  The  single  map  condenses  the  desired  solution  into 
one  object  that  affects  each  observation,  instead  of  maintaining  associations  between 
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every  landmark  at  each  time  step  t. 


Figure  9.  The  SLAM  problem  as  a  dynamic  Bayes  network  storing  a  single  map.  A 
vehicle  travels  through  hidden  states  Xg-.t,  observing  controls  ui-,t  and  observations  zg-.t- 
A  single  hidden  map  0  affects  all  observations  [8]. 

In  this  implementation  with  a  single  map,  vehicle  odometry  acts  as  the  controls 
Ui:t,  driving  the  vehicle  along  its  path.  The  vehicle  takes  LIDAR  measurements  as 
observations  Zt  of  the  environment  0  at  each  pose  within  the  state  x*.  Thus,  a 
SLAM  solution  gives  the  values  for  the  hidden  nodes  of  the  map  0  and  current  state 

Xt. 

When  equipped  with  stereo  cameras  at  equal  height  and  inertial  measurements 
in  two  dimensions,  the  vehicle  gains  additional  input  through  image  egomotion  and 
numeric  integration  respectively,  as  discussed  in  detail  later.  The  novel  aspect  of  this 
research  combines  these  inputs  with  odometry  in  a  linear  Kalman  hlter  as  the  MINS 
system.  MINS  provides  a  navigation  solution,  leaving  map  construction  to  a  separate 
FastSLAM  implementation. 

The  linear  Kalman  hlter  takes  a  pose  change  measurement  6s  from  each  sensor  as 
input.  It  uses  these  to  maintain  a  state  x  and  covariance  S  of  a  combined  pose  with 
associated  accuracy.  A  separate  particle  hlter  then  takes  Kalman  hlter  state  x  as 
its  control  input  u  to  process  a  SLAM  solution.  The  particle  hlter  also  incorporates 
the  LIDAR  scan  z  to  construct  maps,  separating  this  functionality  from  the  MINS 
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sensor  integration.  An  overall  system  diagram  of  MINS  and  FastSLAM  is  shown  in 
Figure  10.  It  has  four  inputs  from  the  four  sensors,  and  maintains  the  particle  hlter 
states  of  a  pose  s  and  map  0  as  system  output. 


Figure  10.  System  diagram  of  the  novel  SLAM  solution.  A  Kalman  filter  takes  twice- 
integrated  IMU  measurements,  egomotion  from  stereo  images,  and  vehicle  odometry 
as  input.  It  then  provides  m  to  a  particle  filter  that  also  takes  LIDAR  ranges  2;. 

As  the  IMU  records  accelerations,  MINS  integrates  each  measurement  to  a  ve¬ 
locity,  then  again  to  a  position.  These  positions  provide  the  Kalman  hlter  with  its 
prediction  step.  Each  time  an  image  pair  is  available,  MINS  calculates  an  egomotion 
pose  and  provides  it  to  the  Kalman  hlter  as  an  observation  step.  Each  odometry  mea¬ 
surement  is  also  a  pose  that  provides  another  Kalman  hlter  observation.  Iterating 
over  the  series  of  prediction  and  observation  steps  updates  the  Kalman  hlter  state  x 
and  covariance  S. 

After  each  odometry  observation,  at  which  time  there  may  or  may  not  have  been 
an  egomotion  update,  the  Kalman  hlter  sends  its  output  to  the  particle  hlter  as 
the  control  input  u.  Combined  with  LIDAR  measurement  2;,  FastSLAM  maintains 
particle  states  each  with  a  pose  s  and  map  0.  These  states  are  the  hidden  variables 
of  the  DBN,  thus  providing  the  SLAM  solution. 
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3.2  Camera  Input 


This  section  first  discusses  FV-SIFT  in  more  detail  as  it  motivates  this  work.  The 
hardware  arrangement  and  data  collection  processes  are  identical  for  both  systems, 
although  this  research  only  uses  FV-SIFT  as  a  comparison  to  previous  methods.  Fol¬ 
lowing  FV-SIFT,  this  section  describes  the  process  for  extracting  feature  locations 
and  ego  motion  algorithms.  MINS  calculates  all  image  and  inertial  components  inde¬ 
pendent  of  the  FV-SIFT  methods. 

3.2.1  Image-Inertial  Navigation. 

This  research  begins  with  the  FV-SIFT  navigation  system  as  presented  in  previous 
work  [65].  Designed  to  use  an  EKF  to  estimate  poses,  a  solution  is  also  valid  using  an 
alternative  UKF  implementation  [19].  The  UKF  version  succeeds  in  navigating  but 
does  not  reduce  the  position  error  of  the  original  EKF  solution  and  takes  a  longer  time 
to  compute.  Regardless  of  its  hltering  technique,  the  system  functions  as  depicted  in 
Figure  11,  resulting  in  a  six  Degrees  of  Freedom  (DOF)  pose  change,  with  positions 
Sp^  and  rotations  ip  in  three  dimensional  space. 

While  an  Inertial  Measurement  Unit  (IMU)  integrates  its  acceleration  measure¬ 
ments  to  position  data,  two  stereo  cameras  save  images.  An  algorithm  (either  SIFT  [40] 
or  SURF  [5])  extracts  features  from  both  images  attempting  to  match  and  locate  each 
feature.  The  system  saves  the  best  twelve  features  as  landmarks  to  be  matched  in 
subsequent  images.  It  continues  to  track  matched  landmarks  and  use  their  observed 
locations  to  update  the  pose  from  integrated  IMU  measurements. 

FV-SIFT  performs  much  better  on  aircraft  than  it  does  on  a  small  indoor  vehicle, 
but  operates  in  either  environment  [65].  Additionally,  it  is  highly  subject  to  the 
accuracy  of  the  IMU,  resulting  in  a  large  error  difference  depending  on  the  unit  used. 
The  path  quality  of  the  navigation  system  is  also  subject  to  feature  loss,  which  results 
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Figure  11.  Operation  of  the  FV-SIFT  system.  It  detects  image  features,  predicts  their 
future  location,  and  attempts  to  match  them  in  subsequent  images  with  movement  [65]. 


in  unrecoverable  errors  after  only  a  few  seconds  without  observing  some  or  all  of  its 
twelve  landmarks.  As  a  result  of  this  loss,  it  relies  completely  on  its  IMU  and  rapidly 
drifts  away. 

Too  much  reliance  on  one  system,  either  image  features  or  inertial  data,  is  mo¬ 
tivation  to  seek  more  consistent  navigation  sensors.  Sudden  movements,  collisions, 
and  vibrations  greatly  affect  an  IMU  but  do  not  affect  odometry  as  much.  As  this  re¬ 
search  concerns  indoor  ground  vehicles,  it  assumes  odometry  is  available  as  an  input. 
A  system  with  stereo  cameras  can  adjust  vehicle  odometry  and  IMU  measurements 
with  its  own  path.  Even  if  each  of  the  inputs  are  not  particularly  accurate,  hltering 
with  an  EKF  or  RBPF  improves  raw  odometry  using  egomotion  to  calculate  a  path 
of  its  own  [55]. 

The  problems  with  FV-SIFT  are  the  reason  for  the  implementation  decisions  that 
follow  in  this  chapter.  It  tightly  couples  the  images  and  inertial  data,  but  also  tightly 
integrates  itself  to  its  EKF  and  cannot  be  easily  extended  due  to  specihcs  of  its 
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implementation.  MINS  separates  the  cameras  from  the  IMU  data  until  it  combines 
them  with  odometry  in  a  linear  Kalman  hlter.  This  way,  each  input  sensor  does  not 
rely  on  another  and  MINS  is  robust  in  areas  that  are  problematic  for  FV-SIFT. 

3.2.2  Feature  Tracking  &;  Stereopsis. 

Camera  egomotion  is  an  older  problem  than  SLAM,  and  researchers  now  seek  to 
combine  the  separate  helds  [26,  55].  Several  algorithms  exist  for  calculating  three  di¬ 
mensional  motion  can  be  applied  to  mobile  robotics  operating  on  a  planar  surface  [1]. 
The  mathematical  solution  using  the  epipolar  geometry  presented  in  this  paper  pro¬ 
vides  the  basis  for  the  feature  processing  used  in  this  research.  The  implementation 
is  based  on  stereopsis,  the  ability  to  approximate  distances  through  stereo  images. 

To  take  advantage  of  stereopsis,  MINS  matches  the  SIFT  features  from  the  left 
camera  image  to  those  from  the  right  camera  image  at  the  same  time.  The  most 
straightforward  way  to  compare  two  SIFT  descriptors  and  determine  a  match  between 
them  is  to  take  their  Euclidean  distance;  however,  the  descriptor  for  each  feature 
consists  of  128  values.  For  computational  purposes,  MINS  calculates  the  inverse 
cosine  of  the  dot  products  of  each  pair  of  left  camera  descriptors  and  right  camera 
descriptors  to  hnd  a  distance  D  between  each  descriptor  pair.  This  is  a  similar 
measure  to  Euclidean  distance  but  faster  to  calculate  as  shown  in  Equation  13. 

D  =  arccos(d^  ■  d^)  (13) 

MINS  then  Ends  the  minimum  distance  value  dj  for  each  feature  i,  where  feature 
index  j  the  closest  match  to  i  in  the  other  image.  If  dj  is  less  than  g  times  the  second 
smallest  distance  value  in  D  that  is  not  dj,  MINS  matches  the  feature  and  saves  its 
pixel  coordinates  in  and  P^.  The  elements  of  each  are  assigned  as  described  in 
Equation  14. 
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pR  =  {pf,  pf  ;  min(P)  <  ^  min(P  -  dj)} 


(14) 


Because  the  vehicle  travels  on  a  flat  surface  with  cameras  at  equal  height,  the 
stereo  images  should  detect  only  horizontal  differences  and  not  vertical  ones.  There¬ 
fore,  to  reduce  the  number  of  false  matches,  MINS  discards  feature  matches  in  the 
other  image  that  measure  greater  than  d  from  horizontal.  This  adds  an  additional 
condition  to  the  elements  of  and  P^,  which  is  described  in  Equation  15  where  nc 
is  horizontal  image  resolution. 


P''  =  {pf,  pf  :|atan2(|/f-|/f,a;f-a;f  +  nc)|<^}  (15) 

At  this  point,  MINS  has  a  set  of  features  correctly  matched  in  the  current  image 
pair.  The  complete  process  for  matching  left  and  right  features  is  shown  in  Algo¬ 
rithm  4. 


Algorithm  4  Feature  Matching 

for  all  features  i  at  pf  in  the  left  image  do 
D  =  arccos(df  ■  d^) 
j  =  argmin(P) 
remove  dj  from  D 
if  dj  <  Q  min(P)  then 

add  pf  to  P^  and  pf  to  P^ 
end  if 

=  xf  —  +  nc 

ART 

y  =  Vj  -  Vi 

if  |atan2(|/^,  a;"^)|  >  d  then 

remove  pf  from  P^  and  pf  from  P^ 

end  if 
end  for 

retnrn  P^,  P^ 


The  result  of  feature  matching  are  the  pixel  coordinates  P^  and  P^  of  the  set 
of  features  to  track  in  the  left  and  right  images,  respectively.  Before  the  coordinates 


43 


can  be  accurately  used,  MINS  removes  lens  distortion  effects  by  performing  twenty 
iterations  through  the  CalTech  distortion  model  procedure  [65]. 

To  do  this,  models  for  each  camera  contain  calibrated  parameters  for  these  calcu¬ 
lations.  These  include  the  principal  point  and  focal  length  values,  each  with  x 
and  y  components.  Additional  parameters  include  hve  values  indexed  by  brack¬ 
ets.  Algorithm  5  uses  this  model  to  remove  lens  distortion  from  each  G  the 
algorithm  is  identical  for  the  right  camera,  substituting  superscript  R  for  each  L. 
Algorithm  5  Removing  Lens  Distortion 

^distort  =  (P^  -  Pc)/fc 

n  ^distort 

for  1  to  20  do 

r  =  rixTix  +  nyUy 

Kadiai  =  1  +  ^h[^]r  +  k^[2]r2  k^[5]r^ 

hy  =  2k^[3]n^ny  +  k^[4](r  -h  21%^) 

K  =  kc[3](r  -h  2nl)  +  2k^[A]nxny 

n  (kdistort  k'jjkradial 

end  for 

return  p*^  =  T^[— na,  Uy  1] 

After  MINS  executes  Algorithm  5  to  remove  distortion  effects  from  and  P^,  it 
can  use  their  true  coordinates  as  obtained  through  the  last  line.  As  measured,  feature 
coordinates  within  camera  pixels  are  unhelpful  until  projected  into  actual  locations 
outside  the  cameras.  To  be  useful  for  navigation,  MINS  uses  epipolar  geometry  to 
measure  the  physical  location  of  each  feature  [1]. 

Epipolar  geometry  calculations  require  information  from  models  of  both  cameras, 
which  have  saved  images  at  me  x  ric  pixel  resolution.  The  hrst  step  in  determining  the 
physical  location  is  to  transform  the  coordinates  of  the  features.  This  is  done  using 
a  Direction  Cosine  Matrix  (DCM)  to  convert  pixels  to  locations  and  another  to 
convert  locations  to  the  correct  reference  frame  C^.  Equation  16  expresses  a  feature 
in  coordinates  appropriate  to  the  left  camera.  MINS  performs  the  same  calculations 
for  each  feature  using  the  models  for  the  left  and  right  camera. 
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r  1  ^ 

-  \mcl‘A 

o^  =  C^(T^)-^  l/^-K/21  (16) 

1 

The  algorithm  then  combines  transformed  coordinates  and  to  take  advan¬ 
tage  of  stereopsis  by  using  the  distance  from  the  right  to  the  left  camera  o^.  MINS 
computes  this  using  multiple  inner  products  in  Equation  17  to  obtain  the  location  f'^ 
as  seen  from  the  left  side. 


r  _  (o^  ■  o-^)(o^  ■  Oc)  -  (o^  ■  o^)(o^  ■  Oc) 

(o^  ■  o^)(o^  ■  o^)  —  (o^  ■ 


A  similar  calculation  in  Equation  18  gives  the  location  of  one  feature  with 


respect  to  the  right  side. 


fR  _  (o^  •  o^)(o^  •  Oc)  -  (o^  •  o^)(o^  ■  Oc)  .  . 

(o^  ■  o^)(o^  ■  o^)  —  (o^  ■  o^)^ 

With  both  and  MINS  simply  averages  the  two  values  to  obtain  feature 
location  1.  Each  1  is  expressed  relative  to  the  vehicle  cameras  and  thus  relative  to  the 
current  vehicle  pose.  This  allows  MINS  to  maintain  a  record  of  all  image  features 
seen  in  both  images  with  their  observed  locations. 

The  overall  process  for  obtaining  each  location  1  from  and  is  described 
in  Algorithm  6.  Vectors  are  denoted  by  bold  lowercase  letters  and  matrices  by  bold 
uppercase  letters;  multiplication  and  dot  products  are  written  explicitly. 

Each  location  1  must  be  rotated  from  its  value  in  Algorithm  6  to  match  the 
coordinate  system  of  the  vehicle.  With  all  two  dimensional  tracked  feature  locations  1, 
MINS  considers  the  these  feature  locations  landmarks.  It  then  uses  them  to  construct 
an  estimate  of  vehicle  motion  independent  of  other  common  inputs  such  as  inertial 
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Algorithm  6  Feature  Locations  from  Pixel  Coordinates 

o‘  =  Ci(Ti)-i[xJ-rmc/21  1] 

o«  =  Cg(Tg)-' -  rmc/21  y^-\ncm  l] 
a  = 
b  = 
c  = 

d  =  ■  Of7 

e  =  ■  oc 

=  (b  e  —  c  d)/(a  c  —  b^) 

=  (a  e  —  b  d)/(a  c  —  b^) 
return  1  =  (f^o^  +  f^o^)/2 

or  odometry  measurements. 

3.2.3  Stereo  Egomotion. 

MINS  uses  each  landmark  location  IJ,  where  i  is  the  landmark  index  and  t  is  the 
current  time  step.  It  consists  of  rectangular  position  x\  and  yl  as  measured  from  the 
current  pose  of  the  vehicle,  as  dehned  in  Equation  19. 

Ij  ^  ‘  (19) 

yl 

The  algorithm  assumes  that  the  only  thing  moving  in  the  environment  is  the  ve¬ 
hicle  and  not  the  landmarks.  Therefore,  it  measures  motion  from  landmark  i  tracked 
at  time  t  as  it  has  a  current  location  \\  and  a  previous  location  1)_^.  This  makes  it 
possible  to  extract  a  vehicle  pose  change,  because  the  algorithm  interprets  motion  in 
each  landmark  to  be  a  result  of  vehicle  motion. 

There  are  K  tracked  landmarks,  which  changes  at  each  time  t.  MINS  attempts 
to  reduce  the  number  of  bad  matches  by  eliminating  those  that  travel  more  than 
ai  standard  deviations  from  the  mean  distance  d.  The  calculation  in  Equation  20 
removes  outliers  from  corrupting  observed  motion,  where  ad  is  the  statistical  standard 
deviation  of  the  numerator  for  each  landmark  i. 
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(20) 


d*  = 


\xi-xi_^\  +  \yi-yi_^\  -d 


MINS  then  uses  a  polar  representation  to  measure  movement.  The  rectangular 
coordinates  of  P  are  an  angle  and  a  radius  calculated  in  Equation  21. 


9^ 

_ 

atan2(|/*,a;*) 

A/(a;*)2  -h  {y^y 

(21) 


Finally,  MINS  measures  the  vehicle  change  from  t  —  1  to  t  by  subtracting  \\  from 
in  polar  form.  Note  that  this  is  the  negative  of  typical  motion,  as  the  algorithm 
is  translating  landmark  movement  in  one  direction  as  vehicle  movement  in  the  other. 
This  subtraction  results  in  a  position  difference  for  each  landmark  i  is  Equation  22. 


69^ 

Ol-i 

1 — - 

1 

_ 1 

1 

7 

- 1 

(22) 


As  a  hnal  step,  the  algorithm  takes  the  mean  of  each  landmark  difference  repre¬ 
sented  by  elements  9  and  f,  then  converts  the  coordinates  back  to  rectangular  form. 
This  creates  egomotion  pose  difference  computed  in  Equation  23. 


r  cos(6') 
f  sin(0) 
9 


(23) 


The  overall  process  for  extracting  pose  change  from  the  set  Lt  of  all  locations 
1  at  time  t  is  described  in  Algorithm  7.  A  bar  over  a  variable  indicates  its  mean  and 
a  indicates  a  statistical  standard  deviation. 

As  MINS  tracks  K  landmarks.  Algorithm  7  calculates  the  angle  and  distance  r* 
of  each  landmark  i  for  both  t  —  1  and  t.  It  then  calculates  each  difference  d*  between 
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Algorithm  7  Landmark  Egomotion 
for  all  time  steps  t  do 
for  i  =  1  to  A  do 

n  =  \x\-x\_^  \  +  \yi-yi_^\ 
d*  =  |n  -  d\/{anY 
if  d*  >  (Trf  then 

remove  P  from  Lt  and  from  Lt_i  / /  this  decreases  K 

end  if 
end  for 

for  i  =  1  to  A  do 

di  =atan%|^xj)_ 

dj_i  =atan%^_^X_i)_ 

rti  =  ^/{xi_^y  +  {yi_i)^ 

end  for 

60  =  6t-i  —  Ot 
6r  =  rt-i-rt^ 

6s^^°  =  [f  cos(d)  fsin(d)  9]'^ 

end  for 


the  landmark  locations  in  the  most  recent  interval.  MINS  calculates  means  f  and  6 
as  the  total  movement  of  all  landmarks. 

Assuming  the  landmarks  are  stationary,  MINS  interprets  their  movement  relative 
to  the  vehicle  as  vehicle  movement.  It  converts  f  and  6  back  to  rectangular  coordi¬ 
nates,  which  is  the  difference  in  vehicle  pose.  This  difference  6s^^°  serves  as  the  pose 
difference  between  t  —  1  and  t  as  observed  by  egomotion. 


3.3  Inertial  Input 

Simply  calculating  6s^^°  is  not  a  viable  navigation  path  by  itself.  As  mentioned 
earlier  in  this  chapter,  a  navigation  solution  takes  a  risk  when  relying  on  a  single 
input  due  to  specihc  disadvantages  in  the  sensor  and  the  errors  that  arise  from  them. 
The  reason  for  the  success  of  the  image  aided  inertial  navigation  system  discussed  is 
indeed  the  combination  of  its  two  inputs  [65]. 

The  MINS  system  presented  in  this  research  also  uses  an  IMU  to  obtain  a  similar 
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inertial  input  pose  difference  With  comparable  estimate  from  the  cameras  in 

it  seeks  a  similar  estimate  from  the  IMU  in  and  as  measured  by  odometry 

in  Vehicle  odometry  records  data  directly,  but  MINS  must  calculate 

from  raw  IMU  measurements  before  combining  the  pose  differences. 

The  IMU  contains  an  accelerometer  and  a  gyroscope  that  measure  specihc  forces 
and  rotational  velocities  [63] .  With  these  instruments,  the  IMU  records  accelerations 
due  to  gravity  and  external  forces  that  cause  movement.  In  three  dimensions,  it 
records  measurements  in  six  DOF.  This  research  only  concerns  movement  in  two 
dimensions,  or  three  DOF,  with  a  stationary  IMU.  Because  this  research  limits  the 
area  of  interest  to  flat  surfaces,  it  assumes  zero  vertical  movement,  roll,  and  pitch. 

These  other  values  are  ignored  and  MINS  is  left  with  three  orthogonal  forces, 
one  of  which  is  due  to  gravity  and  is  discarded.  MINS  only  considers  the  other  two 
accelerations  x,  y,  and  yaw  velocity  u.  Therefore,  the  IMU  measures  two  dimensional 
instantaneous  acceleration  at  time  t  in  Equation  24. 


at  = 


yt 


(24) 


Because  navigating  requires  a  position,  IMU  measurements  a*  must  be  integrated 
twice,  and  Ut  must  be  integrated  once  between  each  time  interval  from  t  —  1  to  t  [63] . 
To  limit  the  effect  of  drifting  due  to  this  integration,  programs  interfacing  with  an 
IMU  commonly  track  a  bias  [65] .  The  vehicle  in  this  research  always  begins  stationary, 
so  MINS  leverages  information  from  the  available  initial  data  to  correct  this. 

Until  the  odometry  indicates  any  vehicle  movement,  MINS  calculates  IMU  mea¬ 
surement  bias  bf  by  taking  a  moving  average  of  Ut-  This  is  shown  in  Equation  25, 
where  r  is  the  total  number  of  measurements  included  in  ht-i  [65]. 
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(25) 


UJ,  +  Tbt^j 

With  bt  calculated,  MINS  tracks  dt,  the  time  interval  since  previous  IMU  measure¬ 
ment  at_i.  It  uses  each  of  these  values  to  approximates  a  three  DOF  instantaneous 
velocity  in  Equation  26  [63]. 


xt  xt-i  +  dt  xt 

vt  =  =  yt_i  +  dt  jjt  (26) 

Ot  (Xt  -  bt 

To  mitigate  further  acceleration  errors  from  producing  unrealistic  velocities,  MINS 
includes  a  novel  step.  Because  it  knows  the  physical  limits  of  the  vehicle,  it  limits  its 
speed  Vt  to  the  known  vehicle  maximum  Vmax-  By  computing  speed  as  the  magnitude 
of  linear  velocity,  MINS  scales  the  elements  of  v*  by  integrated  overall  speed  Vt  at 
time  t  without  affecting  the  angle  of  travel.  This  calculation,  shown  in  Equation  27, 
is  only  applied  if  Vt  is  greater  than  Vmax- 


Xt/Vixt)"^  +  {xt^ 
=  yt/\/{xtf  +  {ytf 


A  similar  process  to  the  velocity  calculation  performs  numeric  integration  to  bring 
velocity  to  position  [63].  Equation  28  approximates  the  current  IMU  estimate  of  the 


pose  difference  5s\ 


=  ^  w_i+v, 


MINS  handles  tracking  bt  and  limiting  v*  for  each  time  step  t  of  IMU  data  at  time 
interval  dt.  MINS  must  store  only  the  current  bias  bt,  the  number  of  time  steps  r, 
the  previous  velocity  vt_i,  and  nothing  more.  The  full  process  for  obtaining 
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from  at  is  shown  in  Algorithm  8. 


Algorithm  8  IMU  Integration 
if  no  odometry  movement  then 
bt  =  {ut  +  rbt-i)  /  {t  +  1) 
increment  r 

end  if 

vt  =  vt_i  +  dt  at 

0  =  Ut  -  h _ 

Vt  =  A/(it)^  +  {ytY 
if  Vt  >  Vmax  then 
Xt  =  Xt/vt 
Vt  =  Vt/vt 
end  if 

^  Vt_l)/2 


Even  with  the  corrections  applied,  the  most  noticeable  issue  with  integrating  twice 
is  that  minor  errors  in  at  cause  (5st™'“  to  grow  with  each  measurement,  and  the  path 
obtained  using  this  difference  drifts  more  rapidly  with  each  measurement. 

3.4  Control  Inpnt  Kalman  Filter 

The  MINS  system  combines  three  separately  obtained  pose  differences  to  deter¬ 
mine  the  final  control  input  Sst  at  each  time  t.  It  combines  from  the  IMU, 

5st^°  from  the  cameras,  and  from  odometry  in  a  linear  Kalman  hlter.  In  the 

same  manner  as  previously  demonstrated,  a  Kalman  hlter  allows  the  three  inputs 
to  produce  a  combined  pose  difference  Each  of  these  inputs  have  different  un¬ 
certainties  that  vary  between  measurements,  as  changing  environments  affect  each 
sensor  differently. 

Kalman  hltering  is  advantageous  to  a  simpler  combination  because  MINS  can 
update  the  hlter  with  diherent  inputs  at  diherent  time  steps.  This  is  critical  because 
each  sensor  measures  at  its  own  interval  and  is  not  always  constant.  A  Kalman  hlter 
continues  to  approximate  the  pose  with  any  combination  of  the  sensors  available. 
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This  means  the  hlter  continues  to  estimate  given  only  one  input,  even  if  both  of  the 
other  inputs  are  unavailable  or  untrusted.  The  hlter  carries  only  its  current  pose  s* 
as  its  state  x^,  with  covariance  'Ef 

Kalman  hlters  are  often  described  as  a  series  of  predictions  and  observations  that 
maintain  x*  and  Et  [43].  The  IMU  collects  data  fast,  so  MINS  offers  to  the 
linear  Kalman  hlter  as  the  prediction.  After  hnishing  each  inertial  integration,  the 
hlter  prediction  updates  the  state  prior  x^  in  Equation  29.  Here,  3x3  matrix  F 
directly  relates  the  prediction  to  state  variables  xt  as  an  identity. 


xt  =  F(xi_i  +  where  F  =  I3  (29) 

To  obtain  the  covariance  prior  E^,  the  Kalman  hlter  considers  the  variances  of 
the  inertial  measurements  aacc  and  Cgyro  along  the  diagonal  of  Q.  The  Kalman  hlter 
propagates  these  variances  to  Et  in  Equation  30,  where  G  relates  the  variances  in  Q 
to  the  values  in  prior  covariance  matrix  Et. 


Et  =  GQG^ 


1  0 

r  1 

r  1 

^acc  0 

1  1  0 

1  0 

0  ^gyro 

0  0  1 

0  1 

^acc 

^acc 

0 


^acc  0 

^acc  0 

0  ^  gyro 


(30) 


Prior  state  xt  produces  a  path  that  considers  only  measurements  from  the  IMU. 
When  odometry  or  egomotion  inputs  are  available,  MINS  provides  the  Kalman  hlter 
with  and  Sst^°  as  observations.  Each  observation  brings  the  prior  state  and 

covariance  to  their  posterior  values.  The  Kalman  hlter  performs  the  same  process 
for  odometry  and  image  updates,  but  maintains  a  diherent  error  model  R*  for  the 
diherent  inputs  at  time  t. 

For  odometry,  the  Kalman  hlter  scales  the  measured  pose  diherence  by  the  time 
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6t  since  the  last  odometry  update.  Equation  31  displays  the  observation  variance  of 
odometry  with  a  constant  heading  variance  of  27i  radians. 

^t\S^odom\  Q  Q 

=  0  0  (31) 

0  0  27r 

For  image  updates,  the  Kalman  hlter  uses  the  number  of  landmarks  k  tracked  in 
the  current  image  pair  as  a  basis  the  variance,  both  position  coordinates  and  heading. 
Equation  32  displays  the  observation  variance  of  an  egomotion  update  R®^°. 

10/A;  0  0 

Rr  =  0  10/A;  0  (32) 

0  0  IOOtt/A; 

The  Kalman  hlter  uses  <58^  and  R^  for  the  appropriate  sensor  to  make  an  obser¬ 

vation  of  current  state  xj.  Just  as  F  relates  the  prediction  directly  to  the  state,  3x3 
matrix  H  relates  the  observation  to  the  state  and  is  also  an  identity.  Equation  33 
shows  the  Kalman  gain  Kj  and  posterior  covariance  obtained  from  a  function  call 
to  a  U-D  decomposition  method  [7].  This  function  exists  within  the  hltering  library. 

(Ki,  Si)  =  ObserveUD(Si,  H  =  R,  Rt)  (33) 

To  observe  its  state,  the  Kalman  hlter  multiplies  each  element  of  by  each 
element  of  a  measurement  residual.  The  residual  is  the  pose  diherence  between  the 
prior  state  and  the  observed  pose.  Equation  34  describes  the  calculation  for  obtaining 
the  posterior  state  Xj  from  the  prior  state  Xt. 

xt  =  xt -1- Ki((5st  -  Xi)  (34) 
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With  the  observation  equations  completed,  the  Kalman  hlter  stores  a  posterior 
state  Xi  and  covariance  S*  for  time  t.  As  MINS  repeatedly  updates  its  hlter  with 
desired  output  xt  always  available,  it  becomes  the  total  pose  difference  5st.  In  this 
way,  the  linear  Kalman  hlter  generates  a  localization  path  for  mapping.  The  path 
created  by  each  5st  is  considered  the  output  of  the  MINS  system. 


3.5  FastSLAM  Implementation 

MINS  provides  its  pose  diherence  5st  to  the  particle  hlter  as  control  input  Ut 
with  an  associated  3x3  covariance  Sj.  With  x^  =  =  Ut,  the  RBPF  has  a  two 

dimensional  path  to  use  for  its  FastSLAM  implementation. 


3.5.1  Motion  Model. 

The  hrst  FastSLAM  step  dictates  the  particles  be  propagated  according  to  the 
control  input  Ut  from  the  MINS  path.  This  control  input  corresponds  to  pose  change 
in  the  odometry  motion  model  from  previous  work  [61].  By  using  the  Kalman 
hlter,  FastSLAM  gains  covariance  St,  containing  values  not  available  when  using  a 
single  sensor.  Since  the  goal  of  particle  propagation  is  to  model  the  pose  uncertainty, 
FastSLAM  incorporates  covariance  matrix  St  into  the  odometry  motion  model. 

Due  to  the  error  models  of  the  Kalman  hlter  inputs,  each  S  is  block-diagonal  in 
addition  to  being  symmetric.  It  separates  the  coordinates  from  the  heading  standard 
deviation,  taking  the  values  shown  in  Equation  35. 


^xy  0 
(^xy  (^1  0 


(35) 


[  0  ^  <^l\ 

The  least  intrusive  manner  to  incorporate  the  values  in  S  is  to  modify  the  motion 
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model  parameters.  This  does  not  affect  the  odometry  motion  model  operation,  which 
remains  useful  as  it  continues  to  accurately  describe  wheeled  vehicle  movement.  The 
uncertainty  in  rotation  by  translation  0:2  and  translation  by  rotation  0:4  model  ex¬ 
pected  vehicle  error,  but  correspond  with  the  zeros  of  S.  Because  of  this,  FastSLAM 
does  not  modify  these  but  uses  conhgured  constants  instead. 

Uncertainty  in  translation  by  translation  and  rotation  by  rotation  ai  corre¬ 
spond  respectively  with  the  upper  and  lower  blocks  of  S.  Therefore,  FastSLAM 
obtains  its  four  motion  model  parameters  in  Equation  36  by  modifying  new  parame¬ 
ters,  Oi  through  04. 

tti  (0-0)01 

CI2  02 

as  ((7l  + ay  +  a^y)a3 

0:4  04 

It  is  worth  noting  that  the  units  are  not  consistent  using  this  method.  The  ele¬ 
ments  of  S  have  units  of  distance  and  heading,  corresponding  to  those  of  its  state, 
which  is  a  pose.  The  motion  model  uses  each  of  its  a  parameters  as  uncertainty  in 
pose  change,  which  has  units  of  change  in  distance  and  change  in  heading.  The  spe- 
cihc  elements  of  S  could  be  subtracted  between  times  t  and  t  — 1  to  obtain  comparable 
values,  but  FastSLAM  uses  Equations  35  and  36  due  to  implementation  details. 

FastSLAM  operates  the  motion  model  for  M  particles  as  described  in  the  previous 
chapter  using  each  of  these  a  parameters.  The  difference  is  that  conhgured  parameters 
ai  and  03  are  modihed  by  covariance  S.  After  each  particle  [m]  propagates  its  current 
pose  the  FastSLAM  algorithm  then  builds  the  corresponding  map  ©I’”]  and 
weights  particle  [m]  within  particle  set  P. 
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3.5.2  Measurement  Model. 


After  the  motion  model  discussed  in  the  last  chapter  propagates  each  particle,  the 
RBPF  must  rank  particle  set  P  using  the  available  sensor  data  combined  with  the 
current  state.  Thus,  the  RBPF  applies  a  sensor  measurement  model  to  calculate  an 
error  for  each  particle  [33].  A  more  correct  pose  propagation  should  reflect  a  better 
match  to  the  current  range  scan  when  compared  to  the  map  stored  by  that  particle. 
FastSLAM  then  uses  the  error  accumulated  through  the  current  time  t  to  resample 
the  particle  distribution. 

The  measurement  model  is  a  signihcant  contributor  to  the  FastSLAM  running 
time,  requiring  one  iteration  for  each  range  scan  for  each  particle  [m].  This  is  more 
of  a  factor  than  the  motion  model  or  resampling  operations,  which  do  not  consider 
the  LIDAR  range  scans. 

In  this  implementation,  the  sensor  data  consists  of  LIDAR  scans  and  each  parti¬ 
cle  maintains  a  current  state  of  the  map.  A  different  sensor  or  map  representation 
technique  requires  a  different  algorithm  that  provides  the  same  measurement  func¬ 
tionality.  As  an  intuitive  approach,  each  particle  in  this  implementation  stores  the 
map  as  an  occupancy  grid,  a  large  matrix  with  each  cell  of  size  Qres  containing  a 
probability  of  that  area  being  occupied  by  an  object. 

The  measurement  model  compares  the  current  LIDAR  scan  to  each  particle  map, 
measuring  consistency  with  the  environment.  Observing  the  environment  is  the  nec¬ 
essary  function  of  the  measurement  model,  but  its  accuracy  is  limited  by  the  map 
resolution  Qres- 

Each  LIDAR  scan  consists  of  a  range  at  an  angle  6s,  where  each  particle  has 
a  pose  and  map  used  in  this  model.  If  the  current  range  is  within 
valid  measurements  to  r^ax,  the  measurement  model  calculations  begin  with 
Equation  37  by  determining  the  current  scanned  position  qj  by  pose  addition. 
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(37) 


„  _  I 


Ts COs{0s) 

Ts  sin(6's) 


0 


The  model  then  makes  similar  calculations  to  those  used  for  applying  the  current 
ranges  to  the  map.  It  begins  by  calculating  the  ratio  p  of  range  scan  to  the 
occupancy  grid  resolution  g^es  in  Equation  38.  The  measurement  model  uses  p  as  a 
step  count  as  it  traces  each  scan  the  same  way  each  map  uses  p  in  to  build  its  grid. 


P  = 


Qres 


(38) 


The  measurement  model  uses  p  to  calculate  position  difference  coordinates  d  in 
Equation  39.  To  do  this,  it  scales  the  difference  between  si™'  and  qt  by 


d  = 


X 


y 


1 

p 


—  xl 


y^-yT 


(39) 


Because  the  map  uses  difference  d  as  an  interval  to  apply  scans,  the  measurement 
model  does  the  same  calculations  to  observe  the  map.  To  step  from  to  qt  along  the 
grid,  it  steps  along  the  scanned  line  using  coordinates  p  in  Equation  40.  FastSLAM 
builds  the  map  locations  along  the  scan  line  from  0  to  p  +  1  using  index  j,  so  the 
model  uses  these  positions  to  view  the  same  grid  coordinates. 


!/!”'  +  uysf 


(40) 


To  incorporate  the  map,  the  model  peeks  at  the  current  particle  occupancy  grid 
value  at  each  location  p.  If  the  cell  has  been  previously  observed  as  occupied 
when  the  current  scan  is  not  at  its  end,  the  measurement  model  updates  map  error 
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for  particle  [m]  in  Equation  41.  Here,  the  model  scales  by  occupancy  range 
Qpr  and  calculates  an  approximate  distance  from  index  j. 


[m]  f  (P  1  j')9res 
\  Qpr 

When  j  reaches  p  +  1,  the  model  expects  location  p  to  be  occupied.  If  this  cell 
has  been  previously  observed  as  unoccupied,  it  updates  the  map  error  for  particle 
[m]  using  Equation  42.  The  model  scales  by  occupancy  range  Qpr  and  uses  grid 
resolution  gres  for  a  minimum  distance. 

H  f  ^42) 

\  Qpr  ) 

To  keep  error  values  low  and  factor  recent  observations  more  than  earlier 
ones,  the  measurement  model  decays  the  previous  error  at  each  time  [52] .  It  does  this 
by  multiplying  constant  discount  factor  7  to  the  previous  error  before  adding 
current  error  in  Equation  43. 


W  — 


Q[xP][y^ 


=  el 


+  Q[x^ 


el’“l  =  leH  +  el-l  (43) 

Each  particle  error  is  calculated  given  the  current  range  scans  in  Algorithm  9. 
Particle  pose  has  already  been  propagated  for  time  t,  where  its  occupancy  grid 
0H  contains  range  scans  through  t  —  1.  This  model  uses  several  predehned  param¬ 
eters,  including  the  minimum  and  maximum  scan  ranges  and  Vmax,  as  well  as 
occupancy  grid  resolution  gres  and  discount  factor  7. 

Once  the  measurement  model  calculates  the  current  map  error  for  each  par¬ 
ticle,  it  is  hnished.  The  RBPF  accumulates  decayed  error  over  time,  to  capture  some 
errors  over  the  length  of  the  path  including  but  not  just  the  most  recent  measurement. 
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Algorithm  9  Range  Scan  Measurement  Model 
for  all  particles  [m]  G  P  do 

for  all  scans  with  range  at  angle  6s  do 

if  Ts  <  Tmin  Or  Tg  >  T^ax  then 

continue 

end  if 

Xl  =  Ts  COs{9s) 

=  Ts  sm{9s) 
qt  =  si”"'  +  [x?  yl  0]^ 

P  r^s/ fibres] 

xd  ^  ^x1  —  )  / p 

yd  = 

for  j  =  0  to  p  +  1  do 

xP  =  xj™'  +  {j)x‘^ 

yP  =  +  (j)yd 

if  J  <  P  and  0  [x^]  [p^]  I™']  >  0  then 

increase  el™'  by  0 [x^]  [p^] H (p  +  1  -  j){gres/9pr) 
break 

else  if  j  >  p  and  0  [x^]  [p^]  <  0  then 

increase  el™'  by  -0[a;P]  [pP]['"'(p^es/fl'pr) 

end  if 
end  for 

[ml  [ml  I  [ml 

el  =  7el_\  +  el 

end  for 
end  for 
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3.5.3  Map  Construction. 


Before  FastSLAM  computes  the  weighting  and  resamples,  each  particle  [m]  applies 
the  current  set  of  ranges  to  its  map  ©[”^1  [33].  This  process  is  similar  to  the  structure 
of  the  measurement  model,  as  it  calculates  Equations  37,  39,  and  40,  then  steps  along 
each  scan.  It  limits  scan  ranges  to  Vmax  when  calculating  until  it  reaches  the 
interior  loop.  When  applying  scans,  each  occnpancy  grid  value  is  decreased  along  the 
path  nntil  the  scan  stops  at  q^,  where  it  increases  the  occnpancy  valne.  The  process 
for  applying  the  current  range  scan  to  each  map  ©I™!  is  shown  in  Algorithm  10. 


Algorithm  10  Map  Constrnction 
for  all  particles  [m]  E  P  do 

for  all  scans  with  range  at  angle  6s  do 


if 

Ts  > 

Tmax  then 

x'^  = 

Pmax  COs{9s) 

y^  = 

r-max  sm{9s) 

else  if 

Ts  >  Tmin  then 

x‘^  = 

Ts  cos(6's) 

y^  = 

Ts  sin(6'J 

end  if 

q 

=  Sj  +  [x^  y^  0]^ 

P 

=  ks 

/  fi'r-es] 

xd  ^  ^x1  —  xl™' )  / p 

yd  = 

for  j  =  0  to  p  +  1  do 

xP  =  xj™'  +  j 

=  yt  +jy 

if  j  <  p  then 

decrease  <d[xP][yP]^'^^  by  (d/pp^) 

else  if  j  >  p  and  <  r^ax  then 
increase  &[xP][yP]^"^^  by  {8/gpr) 

end  if 
end  for 
end  for 
end  for 


Instead  of  measurement  model  error  calcnlations,  the  particle  decreases  its  ©I™! 
cell  valnes  along  each  coordinate  p  and  increases  its  cell  valne  at  the  last  scanned 
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coordinate  p.  Each  cell  has  gpr  possible  occupancy  values,  so  the  grid  requires  several 
scans  to  indicate  a  clear  value.  After  Algorithm  10,  cell  values  correspond  to  a 
likelihood  of  being  occupied. 

This  algorithm  is  the  most  signihcant  contribution  to  the  computation  time  of 
the  FastSLAM  implementation.  This  is  due  first  to  the  loop  for  each  time  step  for 
each  particle  for  each  range  scan.  This  is  the  same  requirement  as  the  measurement 
model,  but  applying  scans  to  maps  also  requires  an  additional  loop  as  the  algorithm 
walks  down  the  length  of  each  scan.  At  each  point,  one  cell  of  the  current  particle 
map  must  be  accessed  and  written,  resulting  in  additional  time  for  the  significant 
memory  requirement.  The  FastSLAM  creators  have  presented  a  method  to  reduce 
this  requirement  by  sharing  map  storage  possible  through  resampling,  though  this  is 
not  implemented  here  [46]. 

Because  there  are  M  particles,  the  set  of  all  maps  represents  0  in  the  same  way 
the  poses  of  P  represent  the  pose  distribution.  As  some  particles  are  more  accurate 
than  others,  FastSLAM  weights  and  resamples  P  to  achieve  a  better  distribution  over 
its  current  pose  St  and  map  0. 

3.5.4  Particle  Weighting. 

Before  the  RBPF  can  resample,  it  must  assign  a  weight  to  each  particle 
[m]  at  time  t.  The  RBPF  is  compatible  with  any  method  used  to  weight  each  par¬ 
ticle,  as  long  as  the  weights  are  normalized  such  that  they  sum  to  1.  FastSLAM 
begins  by  subtracting  the  minimum  values  from  each  map  error  as  calculated 
by  the  measurement  model.  In  doing  this,  it  distinguishes  small  errors  from  large 
errors  appropriately.  Their  difference  results  in  a  total  particle  error  shown  in 
Equation  44. 
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(44) 


To  ensure  a  balance  of  low  error  particles,  FastSLAM  limits  to  a  minimum 
value  of  2.  This  prevents  particles  from  having  an  error  much  lower  than  the  rest  and 
a  resultant  undesired  spike  in  the  posterior  distribution.  Limiting  the  error  value,  and 
thus  the  weighting,  discourages  the  resampling  algorithm  from  replacing  too  many 
particles  with  a  copy  of  the  same  single  particle. 

FastSLAM  then  creates  the  weight  for  each  particle  from  Equation  45 
calculates  the  weight  of  each  particle,  taking  the  reciprocal  of  and  normalizing 
the  result.  This  process  assigns  each  particle  [m]  a  weight  at  time  t. 


m 

wi  = 


M 

m=l 


-1 


(45) 


Once  FastSLAM  weights  the  set  of  particles  P,  the  final  step  is  to  resample  the 
particle  distribution.  FastSLAM  uses  an  adaptive  resampling  technique  as  done  in 
similar  research  successful  with  grid  mapping  [29].  FastSLAM  executes  a  test  after 
each  weighting  to  determine  if  a  resampling  step  is  needed.  This  not  only  reduces  the 
required  computation,  but  also  reduces  the  chances  of  a  good  particle  being  replaced 
from  resampling  too  often. 

FastSLAM  tests  for  resampling  using  each  particle  weight  It  calculates  the 

number  of  effective  particles  rrieff  by  determining  a  sum  of  the  squared  weights. 
The  weights  are  normalized,  but  their  squares  allow  FastSLAM  to  find  how  different 
the  highly  weighted  particles  are  from  the  lower  ones.  Equation  46  describes  the 
calculation  for  number  of  effective  particles  rUe//. 


^eff 


(46) 
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FastSLAM  then  performs  the  resampling  test.  If  me//  is  less  than  half  the  par¬ 
ticles  M/2,  then  it  carries  out  the  resampling  process.  If  me//  is  greater  than  M/2, 
FastSLAM  does  not  resample  P  and  all  steps  are  hnished  for  time  t. 

3.5.5  Resampling. 

FastSLAM  uses  the  resampling  process  to  adjust  its  particle  distribution.  This 
implementation  uses  a  standard  SIR  process  in  a  way  that  is  both  nonlinear  and  non- 
Gaussian  [27].  It  begins  by  finding  a  normalized  cumulative  sum  of  likelihood  weights 
Wcum  using  the  Kahan  algorithm  designed  for  this  purpose  [37].  It  then  chooses  a 
particle  once  for  each  time  its  cumulative  weight  intersects  with  a  random  draw 
from  a  uniform  distribution  U  from  0  to  1.  The  algorithm  carries  a  O(MlogM) 
complexity  from  sorting  Ur- 

With  the  number  of  times  to  resample  each  particle  prs,  FastSLAM  replaces  each 
state  X  and  map  0  for  lower  weighted  particles  with  those  from  higher  weighted  ones. 
The  total  process  for  updating  the  RBPF  is  shown  in  Algorithm  11. 

Once  FastSLAM  resamples  particle  set  P  it  updates  the  RBPF,  completing  all 
necessary  FastSLAM  steps.  An  attractive  aspect  of  an  online  SLAM  implementation 
is  that  a  mapped  solution  makes  itself  available  at  any  time  t.  Here,  FastSLAM 
provides  and  map  ©[”^1  at  all  times,  where  particle  [m]  E  P  has  the  lowest  error 
and  thus  highest  weight 

3.6  Summary 

With  all  calculations  at  time  t  complete,  MINS  and  FastSLAM  proceed  to  the 
next  time  t  -|-  1  accepting  the  subsequent  sensor  inputs.  The  next  time  step  begins 
by  reading  IMU  data,  integrating  its  accelerations  to  velocities,  removing  the  angular 
bias,  then  integrating  velocities  to  IMU  pose  change  This  pose  change  pro- 
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Algorithm  11  RBPF  Resampling 

'^cum:  C,  J,  Ms,  Urs  0 

for  all  particles  [”^1  do 

y  ^  yj[m]  _  c  II  Kahan  algorithm 
t  '^cum  y 
C  t  Wcum  y 

Wcum  =  =  t 

end  for 

Ur  =  M  sorted  samples  from  U{0, 1) 

U,.  (Wcum)^r 

for  all  particles  [m]  do 

Pres  =  0  //  assume  not  resampled  until  found 

if  Us  <  M  and  then 

increment  Urs 

repeat 

increment  Pres  and  Ug  j  j  count  resamples 
until  Us  <  M  and  <  tyl™! 

end  if 

[j] 

Prs  =  Pres 

increment  j 

end  for 

j  =  M  II  update  particle  set  based  on  resampling 

for  i  =  M  to  1  do 
if  pl\  >  0  then 

decrement  j 
xld  =  xW 
0[il  =  0W 

end  if 
end  for 

for  all  particles  [m\  do 

k  =  ptf^  / /  replicate  live  samples 

if  /c  >  0  then 
repeat 

xW  =  x[^’] 

0W  =  0[i] 

increment  i,  decrement  k 

until  k  >  0 
increment  j 
end  if 
end  for 
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vides  the  prediction  update  the  linear  Kalman  hlter  maintaining  position  state  x^+i 
and  covariance  St+i.  This  continues  until  an  egomotion  or  an  odometry  update  is 
available. 

When  an  image  is  available,  MINS  extracts  image  features  from  stereo  cameras. 
It  then  matches  features  in  both  images  at  t  +  1  with  those  in  both  images  at  t,  and 
calculates  an  egomotion  pose  difference  The  linear  Kalman  filter  accepts 

as  an  observation  update.  When  an  odometry  update  is  available,  MINS  measures 
the  odometry  difference  as  directly  from  the  sensor.  It  provides  to  the 

linear  Kalman  hlter  as  an  observation  update  as  it  does  for  an  egomotion  update. 

Once  the  Kalman  hlter  has  been  updated,  its  state  x^+i  becomes  the  current  pose 
of  the  MINS  system  s^+i.  This  research  then  uses  this  pose  as  the  control  input  Ut+i 
and  the  current  LIDAR  range  scan  as  the  measurement  input  Zt+i  to  a  FastSLAM 
implementation. 

The  FastSLAM  implementation  carries  out  the  odometry  motion  model  using  the 
Kalman  hlter  covariance  Si+i  within  its  error  parameters.  This  creates  a  distribu¬ 
tion  of  particle  set  P,  representing  M  proposed  current  poses  s^+i.  FastSLAM  then 
executes  a  measurement  model,  incorporating  LIDAR  measurement  scan  Zt^i  and 
each  map  for  each  particle  [m].  The  measurement  model  calculates  a  map  error 
e[+i  for  each  particle,  which  the  FastSLAM  converts  into  a  normalized  weight 
Finally,  if  rrieff  does  not  reach  M/2,  FastSLAM  resamples  particle  set  P  using  the 
SIR  method.  This  completes  the  steps  for  time  t  +  1. 

Each  subsequent  step  follows  the  same  procedure  with  new  sensor  data,  mapping 
the  data  at  each  new  time  t.  MINS  and  FastSLAM  continue  this  cycle  until  all  inputs 
are  exhausted.  At  this  point,  FastSLAM  produces  map  ©f™'^  from  the  particle  [m] 
with  the  highest  weight  as  its  current  belief  of  the  environment. 
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IV.  Results  &:  Analysis 


This  research  seeks  to  execute  the  implementation  on  real  data  gathered  by  a 
mobile  ground  vehicle.  The  MINS  system  operates  by  design  on  two  dimensional 
data  gathered  by  stereo  cameras,  an  inertial  unit,  and  wheeled  odometry.  In  the 
same  way,  FastSLAM  desires  two  dimensional  LIDAR  scans  corresponding  to  points 
on  the  MINS  output  path. 

To  accomplish  this,  a  vehicle  outfitted  with  all  sensors  explores  an  indoor  hallway 
environment  and  stores  all  input  data  for  the  implementation.  The  implementation 
configures  the  various  algorithms  within  MINS  and  FastSLAM  to  produce  paths  and 
occupancy  grid  maps  as  the  SLAM  solution  of  this  research. 

This  chapter  discusses  the  specifics  of  the  data  gathering  and  values  used  in  ex¬ 
periments.  Next,  it  presents  the  results  of  the  feature  extraction  and  sensor  input 
paths  used  as  MINS  input.  Following  this,  it  compares  the  path  output  from  the 
MINS  system  to  the  similar  FV-SIFT  navigation  system.  Lastly,  it  displays  grid 
maps  produced  by  FastSLAM  and  discusses  their  differences. 

4.1  Testing  Procedure 

The  test  data  is  collected  from  the  Pioneer  P2-AT8  vehicle  shown  in  Figure  12. 
The  P2-AT8  provides  internal  odometry  on  skid  steering  wheels  fitted  with  indoor 
tires.  Mounted  on  the  vehicle  are  a  SICK  LMS  200  LIDAR  unit,  which  measures 
ranges  in  a  180°  horizontal  sweep  at  1°  increments,  giving  it  centimeter  resolution  at 
a  distance  up  to  Tmax- 

The  vehicle  also  carries  the  necessary  hardware  for  FV-SIFT  to  operate.  This 
hardware  consists  of  two  PixeLINK  PL-A741  machine  vision  cameras  with  1280  x  960 
resolution  fitted  with  Pentax  lenses  giving  a  90°  field  of  view.  This  is  combined 
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Figure  12.  The  Pioneer  P2-AT8  vehicle  used  in  testing.  It  carries  stereo  cameras  and 
an  IMU  above  the  LIDAR,  with  an  external  computer  toward  the  rear  of  the  chassis. 

with  a  MicroRobotics  MIDG  II  consumer  grade  MicroElectricalMechanical  Systems 
(MEMS)  IMU  to  provide  the  measurement  data. 

Due  to  the  intensive  requirements  of  image  processing,  an  external  computer  with 
a  2.0  GHz  Gore2Duo  processor  and  4  GB  of  memory  can  process  the  left  and  right 
images.  The  machine  is  capable  of  executing  a  parallelized  SURF  feature  extraction 
algorithm  for  FV-SIFT  as  it  collects  data  by  utilizing  Single  Instruction  Multiple  Data 
(SIMD)  calculations  on  an  Nvidia  9800GTX+  Graphics  Processing  Unit  (GPU)  [64]. 
This  computer  conhguration  records  images  and  IMU  data;  it  is  implemented  for  the 
FV-SIFT  path  but  not  the  MINS  path  presented  in  the  remainder  of  this  chapter. 
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The  internal  vehicle  machine,  a  1.6  GHz  Pentium  M  with  1.5  GB  of  memory, 
records  odometry  and  laser  ranges  simultaneously  on  a  VersaLogic  EBX-12  board. 
The  external  computer  connects  to  the  internal  one  via  Ethernet  cable  to  communi¬ 
cate  EV-SIFT  output  for  comparison  and  to  synchronize  time  between  the  sensors. 
This  arrangement  allows  each  computer  to  collect  the  desired  input  data  into  log  hies 
for  future  use. 

The  vehicle  path  resulting  from  a  manually  driven  test  run  serves  as  the  data  set 
presented  in  the  remainder  of  this  chapter.  The  vehicle  records  data  for  9  minutes 
and  47  seconds  while  both  stationary  and  moving  in  this  run,  and  for  10  minutes  and 
42  seconds  in  the  additional  data  collection  run  presented  at  the  end  of  this  chapter. 

For  both  runs,  the  environment  consists  of  an  indoor  hallway  with  a  tile  hoor  that 
measures  2.5  meters  wide.  The  vehicle  begins  in  the  northeast  corner  before  traveling 
left  around  a  rectangle  approximately  30  x  40  meters  in  size.  After  returning  to  its 
starting  location,  the  vehicle  makes  a  right  turn  into  a  room  and  stops,  ending  the 
hrst  run. 

This  run  creates  a  classic  SLAM  situation  of  closing  a  large  loop.  The  sensors 
must  be  accurate  enough  and  the  number  of  particles  must  be  high  enough  to  ensure 
the  position  may  be  resolved  correctly  once  the  vehicle  returns.  A  highly  trusted  pose 
estimate  allows  the  RBPF  to  use  a  minimal  number  of  particles  to  cover  a  smaller 
distribution,  but  a  larger  loops  require  more  particles  to  cover  a  constantly  growing 
distribution.  This  is  true  for  any  FastSLAM  implementation,  regardless  of  input 
sensors  available  or  map  storage  strategy. 

4.2  Implementation  Details 

The  hardware  saves  all  camera  images  for  later  use  as  portable  graymap  hies  in 
Netpbm  format,  and  saves  all  other  collected  sensor  data  in  text  hies.  With  data 


from  all  sensors  available,  the  physical  implementation  consists  of  two  main  phases. 

The  first  phase  deals  strictly  with  the  image  processing,  feature  extraction,  and 
egomotion  within  MINS.  This  process  is  done  using  Matlab®  script  hies  and  running 
a  compiled  executable  to  extract  SIFT  features,  not  yet  implemented  on  the  GPU. 
This  produces  a  text  hie  consisting  of  egomotion  positions. 

The  second  phase  reads  all  text  hies,  implements  the  linear  Kalman  hlter  and 
FastSLAM,  storing  the  occupancy  grid  maps  as  Netpbm  graymap  format  hies.  This 
phase  is  a  standalone  Win32  application  built  on  the  Bayes++  hltering  library  using 
Microsoft  Visual  Studio  2005  in  the  C++  language. 

The  implementation  reads  one  set  of  stereo  images,  inertial  measurements,  odom- 
etry  data,  and  laser  ranges  simultaneously.  Even  though  the  computers  synchronize 
their  clocks,  the  sensors  collect  and  store  data  at  diherent  rates.  Operating  at  a  con¬ 
stant  50  Hz,  the  IMU  is  the  fastest  sensor  and  updates  the  Kalman  hlter  prediction 
at  this  frequency. 

The  vehicle  stores  its  odometry  and  LIDAR  log  hies  at  an  average  of  12  Hz. 
MINS  hrst  provides  its  Kalman  hlter  with  an  odometry  observation  at  this  frequency. 
Because  the  measurement  model  applies  the  range  scan,  FastSLAM  then  performs 
the  sequential  steps  described  in  the  previous  chapter.  To  restate,  MINS  predicts 
its  Kalman  hlter  with  each  IMU  measurement  until  there  is  an  odometry  reading 
available,  when  it  observes  the  Kalman  hlter  and  continues  with  one  time  step  of 
FastSLAM  operation. 

Because  of  the  time  required  to  calculate  and  save  them,  images  are  only  available 
at  2  Hz.  While  the  IMU  and  odometry  predict  and  observe  for  the  linear  Kalman 
hlter  at  a  higher  frequency,  the  feature  extraction  and  egomotion  algorithms  only 
update  the  Kalman  hlter  when  available.  The  hlter  design  prevents  this  diherence  in 
update  frequencies  from  adversely  ahecting  its  operation. 
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4.3  Algorithm  Parameter  Settings 


The  methodology  discussed  many  values  that  either  model  the  hardware  used  or 
affect  the  output  as  desired.  Thus,  the  MINS  and  FastSLAM  implementations  contain 
many  unique  parameters,  which  must  be  set.  These  include  models  of  the  cameras, 
egomotion  settings,  inertial  integration  settings,  and  parameters  for  the  motion  and 
measurement  models  within  FastSLAM. 

4.3.1  Camera  Model. 

The  model  of  the  stereo  cameras  contains  many  values  specihc  to  the  hardware 
used  in  the  feature  tracking  and  egomotion  implementation.  All  are  properties  of  the 
model  and  settings  of  the  cameras  on  the  left  L  and  right  R  sides.  This  research 
does  not  seek  to  modify  the  camera  model,  as  it  does  not  change  any  hardware 
settings  [23,  65]. 

These  values  include  the  nc  x  me  camera  pixel  resolution  and  offset  oc  from  the 
central  IMU  position.  It  also  includes  the  principal  point  pc  and  focal  length  fc 
vectors,  each  with  an  x  and  y  value,  and  the  five  CalTech  distortion  model  values 
kc  without  units.  Square  matrices  Cc  and  Tc  are  both  Direction  Cosine  Matricies 
(DCMs)  in  three  dimensions  used  to  relate  image  position  to  a  coordinates  as  done 
in  the  previous  chapter.  Table  1  provides  the  camera  model  parameters  used  in  the 
feature  tracking  algorithms. 

All  values  in  Table  1  are  unchanged  from  earlier  hndings  [65].  The  models  for  the 
left  L  and  right  R  cameras  include  the  same  parameters,  and  each  is  used  to  hnd  the 
feature  locations  from  each  stereo  image  pair. 
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Table  1.  Camera  Parameters 


4.3.2  Egomotion  Model. 

In  addition  to  existing  hardware  models,  the  previous  chapter  also  contains  several 
manually  set  parameters.  Each  affects  a  different  aspect  of  MINS  operation  in  the 
egomotion  calculations.  This  includes  the  ratio  g  between  the  best  and  second  best 
feature  matches,  the  maximum  allowable  angle  difference  d  for  a  feature  match,  and 
the  fraction  of  standard  deviations  ai  a  feature  location  may  move  within  to  remain 
a  tracked  landmark.  Table  2  lists  each  of  these  values. 


Table  2.  Egomotion  Parameters 


Parameter 

Value 

Units 

Q 

0.6 

none 

d 

1.5 

deg 

CTl 

0.75 

m 
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Each  of  these  values  was  manually  tuned  to  remove  all  detectable  false  matches  in 
sample  images,  and  maintain  at  least  one  tracked  feature  between  each  image  pair. 
These  parameters  result  in  the  egomotion  path  used  as  an  observation  input  to  the 
MINS  linear  Kalman  hlter. 


4.3.3  Inertial  Model. 

For  inertial  integration,  the  IMU  operates  at  a  constant  frequency  of  50  Hz.  Thus, 
MINS  models  this  with  time  period  dt  as  the  it  processes  each  IMU  measurement. 
The  vehicle  maximum  velocity  Vmax  is  known  by  the  motor  settings  and  the  vehicle 
operator  while  collecting  data,  so  MINS  uses  parameter  in  the  inertial  integration. 
The  IMU  standard  deviations  of  its  internal  accelerometer  cTacc  and  gyroscope  (Jgyro 
for  use  within  the  Kalman  hlter  are  taken  from  the  physical  specihcations  of  the 
unit  [65].  Table  3  lists  each  of  these  values  as  used  in  the  previous  chapter. 


Table  3.  Inertial  Parameters 


Parameter 

Value 

Units 

dt 

0.02 

s 

"^max 

1.0 

m/s 

^acc 

0.196 

m/s^ 

^  gyro 

0.0087 

rad/s 

These  parameters  result  in  an  integrated  IMU  path  that  provides  the  linear 
Kalman  hlter  prediction.  This  covers  all  parameters  used  through  MINS  operation. 

In  comparison  to  the  egomotion  and  inertial  parameters,  the  odometry  hardware 
measures  its  path  directly  and  thus  does  not  require  any  processing.  MINS  does  not 
model  odometry  error  outside  of  the  settings  discussed  in  Chapter  3.  The  error  in 
combined  input  path  Ut  is  instead  handled  in  FastSLAM  by  the  motion  model. 
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4.3.4  Motion  Model. 


Any  FastSLAM  solution  contains  several  parameters  that  change  calculations  in 
the  motion  and  measurement  models.  In  the  motion  model,  the  only  parameters  are 
the  four  values,  02  and  04  directly,  with  oi  and  03  modihed  by  the  covariance  values. 
Table  4  displays  the  four  motion  model  parameters,  with  units  converting  distances 
to  headings  and  back. 


Table  4.  Motion  Parameters 


Parameter 

Value 

Units 

(X\ 

0.005 

rad/rad 

Oj2 

0.001 

rad/m 

as 

1.0 

m/m 

04 

0.05 

m/rad 

The  operator  changes  these  motion  values  to  model  the  believed  error  in  the 
vehicle  odometry.  In  this  research,  control  input  Ut  is  the  MINS  system  path,  so 
these  parameters  instead  model  the  error  of  this  system.  The  researcher  selected 
these  values  to  attain  the  desired  particle  distribution  given  the  error  of  the  MINS 
path. 

4.3.5  Measurement  Model. 

The  measurement  model  works  with  any  range  scan  and  any  size  grid.  The  LIDAR 
range  minimum  and  maximum  come  directly  from  the  hardware  specihca- 
tions.  However,  the  researcher  sets  occupancy  grid  values  depending  on  the  scale  of 
the  data.  In  these  results,  the  grid  is  x  Qy  meters  in  size  with  resolution  g^es-  Within 
each  cell,  the  occupancy  grid  stores  a  value  within  the  range  gpr  as  a  probability  of 
being  occupied.  This  and  discount  factor  7  do  not  have  units.  Table  5  provides  each 
of  the  parameters  used  in  the  FastSLAM  implementation  measurement  model  and 
map  builder. 
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Table  5.  Measurement  Parameters 


Parameter 

Value 

Units 

^min 

0.01 

m 

'1"  max 

8.10 

m 

9x 

70.0 

m 

9y 

70.0 

m 

9res 

0.1 

m 

9pr 

256 

none 

7 

0.99 

none 

Because  and  are  specific  to  the  sensor,  the  grid  is  the  only  entity  that 
affects  the  measurement  model  parameters.  As  discussed  later  in  this  chapter,  the 
occupancy  grid  settings  depend  primarily  on  computer  memory  available.  This  grid 
resolution  Qres  value  was  previously  identihed  as  the  best  tradeoff  between  detail  and 
effectiveness  [61].  Discount  factor  7  values  are  typically  large  fractions,  with  this 
value  experimentally  determined  to  result  in  the  desired  total  error  values  [52].  Using 
these  parameters,  FastSLAM  produces  the  resulting  maps  presented  in  this  chapter. 


4.4  Path  Comparison 

All  three  linear  Kalman  hlter  input  paths;  stereo  image  egomotion,  inertial  integra¬ 
tion,  and  vehicle  odometry,  are  displayed  in  Figure  13.  This  visualizes  the  strengths 
and  weaknesses  of  each  sensor  and  supports  the  efforts  of  this  research  to  combine 
the  three  paths. 

The  path  obtained  by  egomotion  does  not  initially  appear  to  be  particularly  ac¬ 
curate.  Nonetheless,  it  displays  motion  when  expected  and  travels  in  the  known 
direction  when  it  tracks  a  reasonable  number  of  features.  When  the  vehicle  reached 
the  southeast  corner  after  traveling  three  quarters  of  the  loop,  it  tracks  one  to  four 
features  until  facing  north  again.  This  results  in  a  large  discrepancy  and  is  a  visible 
part  of  the  egomotion  path. 
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Figure  13.  Egomotion,  inertial  integration,  and  odometry  paths.  They  are  plotted  over 
the  building  floor  plan  with  an  approximate  true  path  for  reference. 

The  IMU  measures  direction  and  overall  motion  with  surprising  consistently.  How¬ 
ever,  it  is  much  less  capable  of  measuring  distances  as  the  integration  results  in  a 
slight  exaggeration  of  velocity  and  a  larger  one  in  distance.  Thus,  the  inertial  path 
is  much  larger  than  that  actually  traveled  but  appears  similar  in  shape.  This  path 
also  contains  a  significant  error,  once  the  vehicle  returns  to  the  northeast  corner.  The 
integration  translates  stationary  rotations  to  the  left  then  back  to  the  right  as  motion, 
resulting  in  false  motion  at  this  point  in  its  path. 

The  odometry  path  appears  to  be  the  opposite  to  the  inertial  path.  Whereas  by 
its  nature  it  measures  a  very  accurate  distance  traveled,  it  is  subject  to  compounding 
errors  in  heading.  An  uncorrected  balance  problem  with  the  drive  motors  and  steering 
control  of  this  specihc  test  vehicle  results  in  a  pull  to  the  left  while  driving  forwards. 
As  a  result,  the  odometry  measures  motion  in  a  straight  line  while  the  vehicle  keeps 
a  slight  turn.  The  manufacturer  provides  a  means  to  compensate  the  wheel  encoders 
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and  configure  the  odometry  for  this,  but  this  conhguration  was  inoperative  and  un¬ 
available.  The  operator  was  forced  to  repeatedly  correct  the  vehicle  to  the  right  as  it 
drove  through  the  hallways,  resulting  in  the  path  shown. 

Knowledge  of  the  sensor  characteristics  allows  MINS  to  conhgure  its  linear  Kalman 
hlter  values  to  take  advantage  of  the  desired  aspects  of  each  sensor.  The  properties 
discussed  are  as  a  result  of  the  general  sensor  environment  and  the  implementation 
used.  Because  of  this,  they  are  not  likely  to  change  from  one  test  to  another  and 
show  promise  in  future  tests. 

MINS  provides  the  three  motion  estimates  from  previous  time  f  — 1  to  current  time 
t  as  input  to  its  linear  Kalman  hlter.  After  it  processes  each  observation  as  presented 
in  the  previous  chapter,  its  state  produces  the  next  step  in  the  MINS  path  and 
the  FastSLAM  control  input  Uf.  Figure  14  displays  the  MINS  path  alongside  that 
from  FV-SIFT  [23,  65],  which  operates  in  real-time  using  an  EKF,  but  cannot  access 
odometry  as  MINS  does. 

Both  MINS  and  FV-SIFT  output  values  are  suitable  for  navigation  requirements 
in  local  environments  without  using  GPS.  However,  the  differences  between  the  paths 
are  worth  noting.  For  this  particular  data  set,  environment  noise  affects  the  images 
early  in  the  path  near  the  northeast  corner.  FV-SIFT  tracks  a  limited  number  of 
twelve  image  features,  and  with  these  covered,  can  only  rely  on  the  IMU  for  input 
until  it  can  rediscover  features.  It  recovers  from  this  error,  but  not  after  recording  a 
jagged,  inaccurate  path  undesirable  for  a  SLAM  solution. 

The  MINS  system  presented  in  this  research  does  not  suffer  the  same  consequences 
given  in  the  same  input,  as  consistent  odometry  provides  an  additional  input  that 
takes  over  the  linear  Kalman  hlter  with  the  smallest  variance.  This  result  is  also 
displayed  in  other  corners,  where  IMU  integration  measures  false  motion  and  feature 
tracking  hnds  a  new  set  of  features. 
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Figure  14.  Paths  from  the  MINS  system  and  the  FV-SIFT  program.  The  background 
is  a  building  floor  plan  with  approximate  truth  shown  for  reference. 

Comparing  the  two  paths  to  truth  is  the  only  way  to  obtain  an  error,  as  the  survey 
points  are  the  only  trusted  data.  The  path  start,  hnish,  and  each  of  the  corners 
correspond  to  numbered  survey  point  locations  within  the  hallway.  The  distance 
between  the  current  pose  and  the  survey  point  gives  the  path  error  at  that  location 
and  time.  Table  6  displays  this  error,  in  meters,  at  each  known  point,  and  includes  a 
mean  of  the  distances. 

Overall,  the  MINS  path  hnished  almost  a  half  meter  farther  away  than  the  FV- 
SIFT  one,  but  maintains  a  pose  closer  to  truth  through  most  of  its  path.  The  errors 
in  Table  6  do  not  include  errors  while  traveling  along  hallways,  including  the  largest 
discrepancy  in  the  existing  system  after  the  hrst  pass  of  point  34. 

When  taking  the  northwest  corner,  the  FV-SIFT  path  is  over  4  meters  away,  and 
approximately  10  meters  from  truth  for  the  duration  of  the  southern  hallway.  In  fact, 
its  low  error  upon  returning  at  the  last  two  survey  points  is  merely  a  coincidence  of 
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Table  6.  Navigation  Filter  Error 


Map 

Location 

Survey 

Point 

FV-SIFT 
error  (m) 

MINS 
error  (m) 

Start 

50 

0.0 

0.0 

NE 

34 

0.548 

0.261 

NW 

44 

4.429 

0.482 

SW 

83 

10.003 

3.716 

SE 

64 

9.788 

5.413 

NE 

34 

2.931 

5.311 

Finish 

23 

5.709 

6.185 

mean 

4.772 

3.052 

the  path  taken,  and  it  could  easily  be  farther  away  having  taken  another  route. 

As  a  comparison,  the  MINS  path  is  well  within  one  meter  of  truth  through  point 
44.  Its  error  appears  to  grow  slowly  over  the  course  of  the  path,  and  hnishes  just  over 
6  meters  from  the  hnal  point.  A  mean  of  the  point  errors  displays  the  novel  path 
improves  on  the  existing  system  by  over  1.5  meters,  or  35  percent. 


4.5  Map  Comparison 

Mapping  uses  the  linear  Kalman  hlter  state  and  covariance  as  the  control  input 
for  use  alongside  the  LIDAR  range  measurements  in  FastSLAM.  Therefore,  MINS 
exists  to  provide  FastSLAM  with  as  accurate  a  pose  estimate  as  possible,  to  reduce 
the  computation  required  to  maintain  many  particles  within  the  RBPF. 

The  purpose  of  the  RBPF  is  to  store  the  number  of  particles  and  associated  maps 
required  to  account  for  error  in  the  control  input.  In  previous  research,  the  number 
of  particles  is  of  great  signihcance  [47].  How  it  represents  each  of  the  particles  and 
maps  is  also  important,  as  long  paths  desire  a  smaller  number  of  particles  to  limit 
computation  time  [28]. 
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4.5.1  The  Occupancy  Grid. 


The  Qx  X  Qy  meter  grid  with  resolution  Qres  is  just  large  enough  to  capture  the 
hallway  size.  An  important  concern  of  this  implementation  is  how  to  represent  the 
grid  effectively  while  limiting  the  memory  required.  Equation  47  describes  the  total 
memory  required  for  M  particle  maps,  where  b  is  the  number  of  bytes  required  for 
each  grid  cell. 


(aowTAI  (47) 

(yres) 

This  implementation  Erst  limits  the  map  size  by  representing  each  cell  as  the 
smallest  basic  data  type  available.  This  is  a  single  byte  (8  bit)  character,  leaving 
2®  =  Qpr  values  available  for  occupancy.  Substituting  6=1  and  the  grid  values  from 
Table  5  into  Equation  47,  each  occupancy  grid  requires  70  x  70  over  0.1^  bytes  of 
memory.  Table  7  displays  the  memory  requirements  for  the  number  of  particles  M 
given  by  powers  of  ten. 


Table  7.  Map  Memory  Requirements 


M 

memory 

1 

479 

KB 

10 

4.67 

MB 

100 

46.7 

MB 

1000 

467 

MB 

10000 

4.56 

GB 

Many  particles  still  causes  the  memory  requirement  to  grow  ont  of  feasibility  for 
snch  a  small  map.  Even  for  1000  particles,  the  total  memory  reqnirements  for  the 
implementation  remain  on  the  order  of  megabytes.  This  amonnt  of  memory  is  hardly 
a  problem  for  modern  compnters.  However,  this  means  increasing  the  nnmber  of 
particles  to  10000  results  in  gigabytes  of  memory.  Modern  compnters  can  handle  this 
much  storage,  but  FastSLAM  requires  all  maps  to  be  readily  accessible.  This  either 
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requires  an  expensive  machine  with  signihcant  memory  in  hardware  or  signihcant 
increase  in  computation  time  as  memory  swaps. 

Each  cell  of  the  occupancy  grid  contains  a  value  that  corresponds  to  the  probability 
that  cell  is  occupied.  As  the  majority  of  cells  are  never  observed,  the  majority  of  maps 
is  typically  an  medium  shade  of  gray.  As  the  vehicle  scans  its  environment,  FastSLAM 
decreases  the  probability  of  the  grid  cells  it  scanned  through  and  increases  those  where 
the  scan  stopped.  The  result  is  white  space  between  black  walls,  with  the  area  behind 
it  medium  gray  until  the  vehicle  explores  the  area. 

4.5.2  Output  Comparison. 

FastSLAM  builds  a  map  from  any  path  and  range  scan  it  receives  as  input.  The 
occupancy  grid  map  is  an  easily  seen  indication  of  accuracy,  but  it  is  difficult  to  com¬ 
pare  quantitatively  without  comparing  the  path  as  done  in  the  previous  section  [10]. 
This  is  the  case  even  after  considering  a  visual  comparison  of  the  grid  to  a  floor  plan, 
if  a  trusted  one  exists. 

An  occupancy  grid  image  is  straightforward  to  measure  and  easily  interpreted  by 
an  operator.  It  clearly  displays  hallways,  rooms,  obstacles,  doorways,  and  even  closing 
doors  [61].  The  maps  generated  using  this  implementation  clarify  the  problems  with 
a  path,  as  heading  errors  are  visible  through  applying  range  scans. 

To  illustrate  the  relative  accuracy  of  the  input  paths,  the  implementation  produces 
maps  given  different  possible  input  paths  and  the  same  range  scans.  Figures  15,  16, 
and  17  display  maps  obtained  given  the  various  paths  discussed  from  odometry,  FV- 
SIFT,  and  MINS,  respectively.  This  implementation  generated  these  maps  without 
using  the  RBPF  to  generate  multiple  noisy  particles.  This  section  compares  these 
maps  to  the  FastSLAM  output  in  Figure  20. 

The  odometry  map  in  Figure  15  reflects  the  steering  error  of  the  wheels.  Accurate 
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Figure  15.  Map  produced  with  the  odometry  path  and  range  scans.  The  structure  of 
the  hallway  is  clearly  visible,  but  it  appears  skewed. 

with  respect  to  distance,  its  heading  curves  the  hallways  off  the  area.  This  would  be 
the  input  for  a  typical  SLAM  solution  using  only  odometry. 

It  is  theoretically  possible  for  a  RBPF  to  create  an  exact  path  from  this  odom¬ 
etry  input.  However,  it  would  require  precisely  tuned  motion  model  a  values  to 
account  for  the  enormous  odometry  error.  With  accurate  settings,  such  large  param¬ 
eter  values  would  most  likely  demand  millions  of  particles  to  represent  a  large  enough 
distribution.  Any  implementation  using  this  number  of  particles  would  require  an 
unreasonable  amount  of  calculation  time  to  produce  a  consistent  path. 

FV-SIFT  displays  a  more  accurate  heading  in  Figure  16.  It  suffers  from  different 
problems  not  affecting  the  odometry  path,  such  as  each  of  its  corners  being  uncertain. 
Most  signihcantly,  the  error  in  the  northeast  corner  produces  a  false  open  area  which 
actually  corresponds  to  the  northern  hallway.  This  error  propagates  to  the  rest  of  the 
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Figure  16.  Map  generated  with  the  FV-SIFT  path.  It  displays  straight,  slanted  hallways 
due  to  significant  inaccuracies  in  the  corners,  especially  the  northeast  [23,  65]. 

path,  even  if  it  returns  close  to  the  starting  location. 

The  MINS  path  in  Figure  17  is  more  believable.  This  reflects  greater  conhdence 
in  the  location  of  walls  and  corners,  and  is  the  desired  result  from  a  filtered  solution 
like  FastSLAM.  Slowly  accumulating  inaccuracies  move  the  angles  of  later  hallways 
from  closing  the  loop,  so  this  path  is  also  not  free  of  errors. 

These  maps  show  the  MINS  path  is  far  superior  to  a  solution  that  uses  just  the 
odometry.  It  is  not  subject  all  the  problems  of  the  existing  system,  and  maintains  a 
much  more  accurate  heading  than  skid  steering  odometry.  The  advantage  of  accurate 
sensors  such  as  the  LIDAR  and  a  good  path  is  the  ability  to  reduce  the  number  of 
particles  and  be  able  to  process  paths  on  loops  such  as  these  hallways. 
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Figure  17.  Map  generated  using  the  MINS  path.  It  displays  more  accurate  hallways 
than  odometry,  more  accurate  corners  than  FV-SIFT,  and  is  close  to  closing  the  loop. 

4.5.3  Particle  Filter  Output. 

The  design  of  the  RBPF  within  a  FastSLAM  implementation  introduces  controlled 
noise  to  correct  its  input  path.  When  given  control  input  m,  each  particle  [m]  G  P 
carries  a  slightly  different  path  from  that  of  the  input  from  the  motion  model.  The 
RBPF  uses  its  measurement  model  to  adjust  the  error  and  weights  of  particle  set  P 
before  it  resamples.  This  maintains  particles  whose  path  matches  closest  to  range 
scan  observations  ;2. 

An  IBM  laptop  computer  with  2  GHz  Intel®  Pentium®  M  processor  and  2  GB  of 
memory  executes  the  second  phase  of  the  implementation,  including  the  MINS  and 
FastSLAM  algorithms.  As  FastSLAM  scales  linearly  with  respect  to  the  number  of 
particles,  this  machine  takes  4  to  5  minutes  for  every  10  particles  used  in  executing 
the  10  minute  data  set. 
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Figure  18.  Map  from  the  FastSLAM  implementation  using  odometry  as  input.  It 
straightens  the  crooked  hallways  but  not  enough  to  be  accurate  or  close  the  loop. 

The  final  FastSLAM  output  is  the  map  from  the  particle  carrying  the  highest 
weight,  thus  the  particle  most  accurate  to  the  true  path.  For  comparison,  when 
control  input  u  comes  from  odometry,  FastSLAM  produces  the  map  in  Figure  18. 
FastSLAM  generated  this  map  using  100  particles. 

The  effects  of  the  FastSLAM  motion  and  measurement  models  improves  the  odom¬ 
etry  path  significantly.  However,  generating  this  map  requires  a  larger  number  of  par¬ 
ticles  to  represent  the  odometry  error,  and  FastSLAM  cannot  remove  the  rightwards 
curve  of  the  hallways.  Thus,  the  path  returns  close  to  the  starting  location  but  is 
unable  to  accurately  close  the  loop. 

The  FV-SIFT  path  is  more  accurate  in  position  than  the  odometry  path,  but  is 
also  less  consistent.  To  illustrate  these  effects,  the  FV-SIFT  path  is  provides  its  path 
to  the  FastSLAM  implementation.  When  FastSLAM  takes  its  control  input  u  from 
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Figure  19.  Map  from  the  FastSLAM  implementation  using  the  FV-SIFT  path  as  input. 
Errors  in  the  input  path  carry  though  FastSLAM  with  disastrous  effects. 

the  FV-SIFT  output  path,  it  produces  the  map  in  Figure  19,  also  using  100  particles. 

This  map  suffers  from  the  FV-SIFT  path  errors  in  the  northeast  corner  already 
discussed.  However,  this  error  is  much  more  difficult  to  predict  than  odometry  error 
as  done  in  the  motion  model.  The  result  is  that  FastSLAM  is  unable  to  correct  these 
errors  and  they  result  in  a  map  that  is  less  accurate  than  the  input  path  depicted  in 
Figure  16. 

Using  the  additional  sensors  in  the  MINS  system  improves  the  control  input  u 
provided  to  FastSLAM  as  presented  in  this  research.  Figure  20  displays  the  map 
generated  by  this  implementation  using  all  parameters  and  settings  presented.  Fast¬ 
SLAM  generated  this  map  using  only  30  particles. 

This  map  is  not  perfect,  most  evident  by  the  difference  in  the  northeast  corner  as 
the  vehicle  completes  the  loop.  Unfortunately,  early  errors  in  the  path  adversely  affect 
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Figure  20.  Map  from  the  highest  weighted  particle  from  the  FastSLAM  implementation 
using  MINS  as  input.  It  places  its  hallways  straight  and  returns  to  the  initial  location. 


each  pose  afterward,  however  subtle  they  are.  However,  the  FastSLAM  algorithm 
using  the  MINS  path  closes  the  loop  within  a  meter  of  error. 

Even  without  being  perfect,  the  FastSLAM  output  is  a  noticeable  improvement  to 
the  MINS  output  used  to  produce  the  map  in  Figure  17.  Most  importantly,  FastSLAM 
successfully  matches  scanned  walls  as  it  returns  to  the  northeast  corner.  Both  the 
MINS  position  and  the  odometry  FastSLAM  position  at  this  same  time  is  farther  away 
and  at  an  angle,  failing  to  close  the  loop.  Table  8  compares  the  current  best  particle 
path  from  the  odomery  FastSLAM  and  the  MINS  FastSLAM  implementations.  Both 
path  errors  are  measured  from  the  same  hallway  survey  points  displayed  earlier. 


At  each  point,  the  ontput  path  from  the  FastSLAM  implementation  decreases 
the  error  of  its  input.  The  position  of  the  best  current  particle  is  accurate  to  within 
a  meter  at  all  measurable  locations,  with  an  average  under  half  of  a  meter  from 


Table  8.  FastSLAM  Error 


Map 

Location 

Odometry 
FastSLAM 
error  (m) 

MINS 
FastSLAM 
error  (m) 

Start 

0.0 

0.0 

NE 

0.359 

0.269 

NW 

0.128 

0.584 

SW 

0.898 

0.742 

SE 

2.452 

0.863 

NE 

3.709 

0.096 

Einish 

6.000 

0.298 

mean 

1.935 

0.408 

the  survey  points.  Using  the  MINS  path  not  only  requires  30%  of  the  odometry 
FastSLAM  computation  time  by  reducing  the  number  of  particles,  but  improves  on 
its  path  by  reducing  the  error  by  79  percent. 

4.5.4  Additional  Testing. 

To  validate  both  the  MINS  system  and  the  FastSLAM  implementation,  the  testing 
procedure  provides  another  data  set  collected  by  the  same  vehicle.  All  algorithm 
settings  are  held  to  the  same  values,  making  the  only  change  the  input  data.  This 
testing  ensures  the  parameter  settings  and  models  used  are  not  specihc  to  one  data 
set  and  can  be  extended  to  other  situations. 

In  this  additional  data,  the  same  vehicle  traveled  the  same  environment  taking  a 
different  path.  The  vehicle  begins  in  the  same  starting  location  and  completes  the 
same  loop.  Instead  of  then  turning  off  into  the  room,  the  vehicle  begins  another  loop 
retracing  its  path  along  the  hallways  before  hnishing  in  the  southeast  corner.  This 
collection  lasts  10  minutes  and  42  seconds  due  to  farther  distance  traveled  and  less 
time  stopped. 

The  paths  measured  by  egomotion,  IMU  integration,  and  odometry  are  similar 
to  the  other  data.  The  MINS  path  is  also  comparable  as  it  uses  these  as  input,  but 
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Figure  21.  Paths  from  the  MINS  system  and  the  FV-SIFT  program  on  the  additional 
data  set,  shown  over  the  building  floor  plan.  FV-SIFT  provides  a  more  accurate  path, 
but  MINS  provides  a  smoother  path. 


FV-SIFT  is  notably  more  accurate  as  it  avoids  the  severe  feature  loss  it  experienced 
in  the  previous  run.  Figure  21  shows  the  paths  generated  from  the  FV-SIFT  and 
MINS  systems  over  the  same  floor  plan. 

There  is  no  loss  of  features  in  the  first  corner,  so  the  FV-SIFT  path  is  closer 
to  truth  than  the  MINS  path  at  most  points.  This  is  not  indicate  a  failure  of  the 
MINS  system,  however,  as  closer  inspection  reveals  that  the  FV-SIFT  path  is  still 
more  jagged  than  the  MINS  path.  This  is  often  accepted  for  a  navigation  solution, 
but  results  in  signihcant  motion  model  noise  and  thus  a  difficult  path  when  used  as 
SLAM  input. 
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Figure  22.  Map  from  the  highest  weighted  FastSLAM  particle  using  the  additional 
data  MINS  path  as  input.  It  completes  the  loop  then  continues  matching  the  map  to 
previously  traveled  environment. 

Using  the  MINS  path  for  FastSLAM  input  produces  another  map  of  the  same  area 
from  the  additional  data  set.  Figure  22  displays  the  map  generated  using  60  particles 
and  unchanged  implementation  settings. 

This  map  shows  that  FastSLAM  also  closes  the  loop  for  this  data,  returning  to 
previously  traveled  areas  before  losing  accuracy  in  the  southwest  corner.  The  windows 
in  this  corner  at  the  LIDAR  height  causes  this  problem,  as  the  light  transmits  through 
the  glass  and  returns  a  maximum  range  value.  This  results  in  an  open  area  visible 
in  the  maps,  but  also  results  in  a  less  certain  measurement  model.  When  the  scan 
distance  is  a  maximum,  the  model  does  not  use  this  scan  as  the  sensor  does  not  observe 
any  obstacles  to  construct  particle  weights  from.  Without  the  ability  to  properly 
weight  the  particles,  none  are  correct  as  the  vehicle  takes  the  corner  a  second  time. 

The  additional  data  covers  many  of  the  same  survey  points  in  a  slightly  different 


order.  The  primary  difference  is  the  number  of  repeated  locations  as  the  vehicle 
travels  around  the  loop  a  second  time.  Table  9  compares  the  error  at  the  surveyed 
points  for  the  FV-SIFT,  MINS,  and  FastSLAM  paths. 


Table  9.  Additional  Data  Error 


Map 

Location 

Survey 

Point 

FV-SIFT 
error  (m) 

MINS 
error  (m) 

FastSLAM 
error  (m) 

Start 

50 

0.0 

0.0 

0.0 

NE 

34 

0.763 

0.527 

0.503 

NW 

44 

1.609 

2.312 

0.747 

SW 

83 

3.772 

5.786 

1.599 

SE 

64 

4.030 

5.964 

0.861 

NE 

34 

2.549 

2.833 

0.471 

NW 

44 

2.550 

5.028 

0.779 

SW 

83 

6.014 

9.912 

1.107 

Finish 

63 

5.235 

9.103 

5.506 

mean 

2.947 

4.607 

1.286 

As  discussed,  the  FV-SIFT  path  is  closer  to  truth  than  the  MINS  path  for  this 
data.  This  is  partly  due  to  Table  9  displaying  only  errors  at  specihc  points  and  not 
differences  in  heading.  FastSLAM  produces  a  much  more  accurate  path  than  both 
when  using  MINS  for  input,  decreasing  MINS  error  by  72%  to  average  56%  less  error 
than  FV-SIFT. 

4.6  Summary 

The  FastSLAM  in  this  research  does  not  produce  the  ideal  path,  but  is  remarkably 
close  and  produces  a  much  better  map  than  any  other  presented.  Most  importantly, 
the  map  in  Figure  20  is  suitable  for  mission  planning  and  accurate  enough  for  local 
area  navigation. 

This  FastSLAM  map  is  achieved  from  the  data  gathered  by  a  vehicle  operating 
stereo  cameras,  an  IMU,  odometry,  and  a  LIDAR.  This  vehicle  is  driven  manually 
to  gather  all  sensor  data  around  an  indoor  environment  consisting  of  a  hallway  loop. 
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The  collection  results  in  images  and  text  files  as  input  to  the  MINS  and  FastSLAM 
implementations. 

The  implementation  runs  in  two  phases.  The  first  phase  processes  the  stereo 
images  and  extracts  an  egomotion  path.  The  second  phase  combines  the  egomotion 
path  with  an  integrated  IMU  path  and  vehicle  odometry  in  the  MINS  system.  It 
then  uses  the  MINS  path  and  LIDAR  ranges  to  execute  FastSLAM  and  construct 
occupancy  grid  maps  of  the  environment. 
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V.  Conclusion  Sz.  Future  Work 


When  the  environment  is  unknown,  the  SLAM  problem  states  that  a  vehicle  must 
construct  a  map  detailing  its  surroundings  while  using  the  map  it  is  building  to 
maintain  an  accurate  location.  Such  a  vehicle  is  faced  with  the  fundamentally  difficult, 
and  circularly  dehned  SLAM  problem.  This  research  presents  the  hrst  SLAM  solution 
to  integrate  stereo  cameras,  an  IMU,  and  odometry  into  a  MINS  path,  then  combines 
the  MINS  path  with  a  LIDAR  in  a  FastSLAM  implementation.  A  P2-AT8  equipped 
with  these  sensors  completed  an  indoor  hallway  loop  generated  real  data  for  testing. 
The  MINS  path  is  more  accurate  than  the  FV-SIFT  path  for  navigation,  and  the 
FastSLAM  map  is  more  accurate  than  an  implementation  using  only  the  odometry 
and  LIDAR. 

Both  the  MINS  and  FastSLAM  implementations  presented  in  this  document  suc¬ 
ceed  in  producing  accurate  maps,  but  remain  a  reach  from  being  perfect.  Accordingly, 
this  chapter  discusses  the  various  positive  and  negative  components  in  the  implemen¬ 
tation.  Later,  it  covers  several  related  topics  that  may  be  used  in  the  future  to 
improve  on  this  implementation  or  in  areas  related  to  SLAM  and  navigation. 

5.1  Conclusions 

The  MINS  system  presented  in  this  research  is  effective  at  its  purpose  but  is  also 
complicated  to  compute.  Creating  and  operating  Kalman  Liters  requires  a  certain 
area  of  expertise,  separate  from  knowledge  of  robotics  and  mapping  algorithms. 

The  Kalman  Liter  implemented  is  remarkably  simple  compared  to  similar  solutions 
such  as  FV-SIFT  [23,  65].  The  MINS  Liter  could  most  likely  be  replaced  to  similar 
eLect  by  a  weighted  average  of  inputs  as  this  is  the  same  process  the  Liter  carries  out 
with  diLerent  computational  requirements. 
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The  most  time  consuming  and  intensive  aspect  of  this  research  surrounds  the  fea¬ 
ture  tracking  and  egomotion  aspects  of  using  cameras  for  navigation.  Interestingly 
enough,  the  camera  images  are  also  the  most  difficult  input  to  use  and  produce  the 
least  recognizable  path.  This  cannot  simply  be  a  difficulty  of  working  with  cameras, 
as  other  research  has  presented  egomotion  more  successfully  [55,  56].  FV-SIFT  ac¬ 
complishes  a  mostly  accurate  path  using  just  cameras  and  the  IMU  using  either  SIFT 
or  SURF,  whereas  this  egomotion  has  only  been  attempted  using  SIFT.  Instead,  there 
appears  to  be  a  rather  significant  improvement  in  the  methods  presented  here  that  is 
yet  to  be  discovered. 

In  stark  comparison,  the  odometry  and  IMU  sensors  both  involve  very  simple  cal¬ 
culations  to  process  compared  to  more  complicated  image  methods.  By  its  nature, 
odometry  records  data  in  the  exact  format  desired  for  navigation.  The  IMU  integra¬ 
tion  requires  only  a  handful  of  values  and  operations  in  0(1)  constant  time.  Both  are 
feasible  for  a  real-time  online  navigation  solution,  even  on  a  very  slow  machine. 

More  interesting  than  their  convenience  is  that  the  odometry  and  IMU  paths  in 
this  research  ended  up  being  more  accurate  than  the  egomotion  path.  Not  only  is 
the  IMU  used  to  collect  the  data  a  small  consumer  grade  unit,  but  the  vehicle  skid 
steering  is  not  the  ideal  odometry  hardware.  Other  vehicles  produce  odometry  that 
is  approximately  accurate  as  the  MINS  path  presented  here  without  the  additional 
sensors  or  computation. 

Accurate  odometry  provides  a  great  improvement  to  the  accuracy  of  a  mapping 
solution,  but  constant  indoor  conditions  are  not  always  available.  A  major  success  of 
the  research  presented  is  the  combination  of  inaccurate  odometry  that  drifts  and  a 
small,  inexpensive  IMU  to  achieve  a  path  much  more  accurate  in  position  than  IMU 
and  more  accurate  heading  than  odometry.  Based  upon  the  sensor  data,  these  two 
sensors  may  be  functional  without  using  the  images  for  egomotion. 
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This  gives  rise  to  the  possibility  that  the  cameras  may  be  too  much  of  a  burden 
in  certain  situations.  If  MINS  can  incorporate  odometry  and  inertial  measnrements 
together,  it  would  not  reqnire  the  stereo  cameras  or  all  its  associated  calculations. 
This  arrangement  may  simplify  the  implementation,  but  it  also  removes  one  of  the 
sensors  that  provides  an  additional  observation  to  sense  the  environment.  Removing 
this  observation  lowers  the  durability  of  the  system  by  making  it  rely  more  heavily 
on  less  sensors. 

Experimenting  with  less  or  different  sensors  would  be  a  good  area  for  future  SLAM 
research  to  look  into.  Along  these  lines,  the  remainder  of  this  chapter  discusses  this 
possibility  and  several  others  for  continning  this  area  of  SLAM  research  and  vehicle 
navigation. 

5.2  Future  Work 

There  are  two  main  goals  when  it  comes  to  mobile  navigation  and  SLAM  solntions. 
One  is  to  improve  on  the  accuracy  and  consistency  of  solntions,  and  the  other  is  to 
reduce  the  computation  time.  Snggestions  for  fntnre  SLAM  research  to  improve  one 
or  the  other  logically  divide  themselves  into  one  of  these  areas.  The  remainder  of  this 
chapter  discusses  the  potential  topics  nnder  these  two  research  objectives. 

5.2.1  Mapping  for  Accuracy. 

To  improve  the  accuracy  of  a  SLAM  solution,  previous  results  lean  toward  use  of 
a  RBPF  regardless  of  sensors  or  map  representation.  Not  only  has  it  produced  the 
most  accurate  maps  when  compared  to  other  strategies,  but  it  allows  the  user  to  vary 
the  number  of  particles  M,  adjusting  the  algorithm  to  any  anticipated  situation  [61]. 

The  most  accurate  results  are  those  that  leverage  the  most  accurate  sensors.  Use 
of  tuned  odometry  such  as  that  be  the  PowerBot  help,  but  a  more  important  method 


94 


is  to  incorporate  a  scan  matching  algorithm  [28,  29].  Regardless  of  the  control  input 
path,  scanner  technology  such  as  the  LIDAR  unit  used  is  much  more  accurate  than 
the  compounding  error  from  any  odometry  or  inertial  path.  Adjusting  the  motion 
model  by  scan  matching  moves  the  particle  distribution  only  to  the  most  likely  areas. 

One  way  to  achieve  a  similar  effect  would  be  to  incorporate  RBPF  particle  error 
back  into  the  MINS  hlter.  This  only  applies  to  this  implementation  with  an  initial 
sensor  fusion  hlter  and  a  following  mapping  hlter,  but  would  likely  improve  its  par¬ 
ticle  distribution  in  the  same  manner  as  LIDAR  scan  matching.  After  the  RBPF 
weights  each  of  its  particles,  those  with  higher  weights  form  a  new  mean  pose  Sj. 
This  mean  could  replace  or  ahect  Kalman  hlter  state  either  before  or  after  the 
RBPF  resamples. 

An  alternative  to  a  more  standard  odometry  and  range  scan  approach  would 
consider  diherent  types  of  sensors.  The  previous  section  discussed  the  possibility  of 
removing  the  cameras  from  the  MINS  system,  leaving  odometry  and  an  IMU  for 
control  inputs  that  could  be  as  ehective  as  the  solution  presented  here. 

One  might  also  consider  using  higher  quality  components  to  improve  the  raw  data 
keeping  a  similar  algorithm.  A  larger  IMU  can  have  order  of  magnitude  improvements 
in  measurement  accuracy  over  the  model  used,  so  this  advantage  could  certainly  be 
leveraged  in  this  implementation  [65].  One  could  also  take  the  opposite  approach, 
using  more  cheap  components  to  keep  costs  low.  Using  two  inertial  units  on  either  side 
of  vehicle  allows  translation  and  rotation  to  be  interpreted  between  both  readings. 

A  diherent  sensor  arrangement  would  transfer  a  range  scanner  to  a  new  platform 
with  cameras  and  an  IMU,  removing  odometry  as  a  sensor.  The  success  of  the  im¬ 
age  aided  inertial  system  has  already  been  demonstrated  for  navigation  on  airborne 
platforms  [23,  65].  Incorporating  a  range  scanner  to  a  light  aircraft  brings  SLAM 
capability  to  a  highly  mobile  platform  bringing  the  advantages  of  an  airborne  vehicle 
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to  the  SLAM  problem. 

Much  of  the  research  using  cameras  for  mapping  seeks  to  bring  the  SLAM  problem 
down  in  scale.  Rather  than  have  a  massive  vehicle  plotting  areas  the  size  of  university 
campuses,  cameras  seek  to  lower  the  cost  and  change  the  sensor  type  in  potentially 
hazardous  environments.  Future  research  done  with  passive  sensors  allows  vehicles 
to  step  away  from  range  based  solutions  using  expensive  LIDAR  scanners. 

Cameras  are  passive  sensors,  those  that  do  not  project  energy  into  the  environ¬ 
ment.  Range  scanners  like  radar,  sonar,  or  LIDAR  units  do  this  by  their  operation. 
The  effect  of  shooting  so  much  energy  has  unpredictable  consequences  depending  on 
what  a  vehicle  encounters. 

Testing  environments  generally  consist  of  empty  hallways  or  solid  obstacles,  but 
this  is  not  the  case  for  SLAM  application  environments.  The  repetitive  bursts  of 
light  or  sound  can  not  only  damage  items  around  of  the  vehicle,  but  also  affect  the 
environment  and  leave  evidence  of  intrusion.  In  hostile  environments,  active  sensors 
reveal  the  presence  vehicle  as  they  are  easily  detected.  An  exploring  vehicle  eqnipped 
with  passive  sensors  does  not  snffer  this  vnlnerability. 

5.2.2  Mapping  for  Speed. 

If  the  main  goal  of  a  RBPF  is  to  represent  each  possible  pose  a  vehicle  can  have, 
the  secondary  goal  of  a  RBPF  implementation  is  to  reduce  the  number  of  particles 
and  still  do  this  [46].  Reducing  the  nnmber  of  particles  in  a  Liter  enables  FastSLAM 
to  prodnce  a  map  faster.  Of  course,  dropping  the  nnmber  of  particles  down  to  1  does 
not  represent  any  distribution  other  than  a  single  belief  as  presented  [47]. 

A  nice  aspect  of  FastSLAM  is  that  a  more  accurate  belief  affords  a  faster  solution, 
as  it  reqnires  less  particles  to  cover  a  smaller  distribntion  area.  Researchers  showed  the 
effectiveness  of  scan  matching  by  redncing  the  RBPF  to  10  or  30  particles  for  a  variety 
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of  different  data  sets  [28,  29] .  This  allows  them  to  compute  the  map  much  faster  than  if 
the  same  implementation  used  hundreds  of  particles.  With  either  number  of  particles, 
the  map  may  not  be  noticeably  different.  The  challenging  ground  remains  the  balance 
of  speed  versus  maintaining  enough  to  produce  an  accurate  map  consistently. 

For  the  stereo  vision  piece,  there  are  several  ways  to  improve  the  speed  of  cal¬ 
culation.  One  is  to  use  a  different  feature  tracking  algorithm,  as  this  contributes 
signihcantly  to  the  image  processing  time.  Using  HOG  for  feature  extraction  instead 
of  SIFT  or  SURF  has  been  presented  to  take  less  time  and  achieve  comparable  re¬ 
sults  [14,  13].  Reducing  feature  extraction  time  would  have  a  great  affect  on  the  time 
to  obtain  egomotion. 

Another  implementation  could  also  limit  the  region  of  feature  extraction  using 
input  from  other  sensors  like  the  IMU  [65].  This  system  also  limits  the  number 
of  features  tracked  to  speed  processing  time.  As  an  effect,  the  system  only  seeks 
matches  to  its  tracked  features  and  avoids  unnecessary  calculations.  Implementation 
on  a  specihc  computer  also  speeds  this  process,  something  not  accomplished  in  this 
research  [23]. 

A  common  goal  remains  to  implement  a  SLAM  solution  in  real-time.  Researchers 
have  already  produced  maps  whose  computation  time  is  less  than  the  data  collec¬ 
tion  [29].  However,  developments  have  yet  to  present  an  actively  running  SLAM  test, 
or  market  a  vehicle  featuring  SLAM  as  part  of  its  robotic  capabilities. 
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